Crates.io | gfa-reader |
lib.rs | gfa-reader |
version | 0.1.4 |
source | src |
created_at | 2024-01-15 13:00:52.054608 |
updated_at | 2024-01-15 13:00:52.054608 |
description | Reading gfa format v1 |
homepage | |
repository | https://github.com/MoinSebi/gfaR |
max_upload_size | |
id | 1100318 |
size | 2,537,952 |
Able to work with version 1.0, 1.1, 1.2 and 2.0 in a single structures.
GFA format specification:
gfa-readder = { git = "https://github.com/MoinSebi/gfa-reader", branch = "main" }
OR
gfa-reader = "0.1.4"
gfa-reader has two main structs: Gfa for versions 1.0, 1.1, 1.2 and 2.0 and NCGfa for version 1.2 or lower.
Gfa represents the basic implementation for all versions and node ids. As stated in the specification, node ids can be numeric or alphanumeric, therefore represented as a String in our implemenation. This can lead to increased memory.
NCGfa (NumericCompact) is a compact representation of the graph with
numeric and compacted (starting at 1) node ids. Can be used for variation/genome graphs from pggb or minigraph-cactus.
Several GFA entries have optional fields. Most of the time, these fields are not needed for the basic graph structure. Therefore, they can manually read, if needed or left out. This option will be set once for all entries, which either parse or don't parse the optional information.
Same is true for Overlaps, they are optional in many GFA entries and can be parsed or not. Mostly they are not needed for the basic graph structure and can be left out as above.
In several specific cases, edges are not needed since the graph structure can be shown with the path information. The edges struct is always present, but depending on the parse settings, will never be populated.
LEER
Pan-SN spec is a specification for storing variation graphs in a GFA format. It is strongly supported by gfa-reader with a pansn struct. It allows you to utilize genome, haplotype or path level collections, dependent on the use case.
/// use gfa_reader::{Gfa, Pansn, Path};
///
/// let mut graph: Gfa<()> = Gfa::new();
/// graph.parse_gfa_file("data/size5.gfa", false);
/// let pansn: Pansn<Path> = Pansn::from_graph(&graph.paths, " ");
We recommend using NCGfa in every scenario since there are two main advantages:
For any graph-related output which is based on the features of the graph, don't forget to re-convert node id in order they fit to the input graph structure.
Convert the graph to numeric and compact node ids before parsing. This saves time for parsing and makes computation faster.