twobit

Crates.iotwobit
lib.rstwobit
version0.2.1
sourcesrc
created_at2020-08-04 14:29:14.750254
updated_at2022-06-26 09:14:19.295129
descriptionPure Rust implementation of the TwoBit sequence file format
homepage
repositoryhttps://github.com/jbethune/rust-twobit
max_upload_size
id272909
size76,191
Ivan Smirnov (aldanor)

documentation

https://docs.rs/twobit

README

twobit

Efficient 2bit file reader, implemented in pure Rust.

Build Latest Version Documentation twobit: rustc 1.51+ MIT

The 2bit file format is used to store genomic sequences on disk. It allows for fast access to specific parts of the genome.

This crate is inspired by py2bit and tries to offer somewhat similar functionality with no C-dependency, no external crate dependencies, and great performance. It follows 2 bit specification version 0.

Examples

use twobit::TwoBitFile;

let mut tb = TwoBitFile::open("assets/foo.2bit")?;
assert_eq!(tb.chrom_names(), &["chr1", "chr2"]);
assert_eq!(tb.chrom_sizes(), &[150, 100]);
let expected_seq = "NNACGTACGTACGTAGCTAGCTGATC";
assert_eq!(tb.read_sequence("chr1", 48..74)?, expected_seq);

All sequence-related methods expect range argument; one can pass .. (unbounded range) in order to query the entire sequence:

assert_eq!(tb.read_sequence("chr1", ..)?.len(), 150);

Files can be fully cached in memory in order to provide fast random access and avoid any IO operations when decoding:

let mut tb_mem = TwoBitFile::open_and_read("assets/foo.2bit")?;
let expected_seq = tb.read_sequence("chr1", ..)?;
assert_eq!(tb_mem.read_sequence("chr1", ..)?, expected_seq);

2bit files offer two types of masks: N masks (aka hard masks) for unknown or arbitrary nucleotides, and soft masks for lower-case nucleotides (e.g. "t" instead of "T").

Hard masks are always enabled; soft masks are disabled by default, but can be enabled manually:

let mut tb_soft = tb.enable_softmask(true);
let expected_seq = "NNACGTACGTACGTagctagctGATC";
assert_eq!(tb_soft.read_sequence("chr1", 48..74)?, expected_seq);
Commit count: 79

cargo fmt