rust-parallelfastx

Crates.iorust-parallelfastx
lib.rsrust-parallelfastx
version0.1.1
sourcesrc
created_at2022-10-06 14:03:32.039023
updated_at2022-10-06 14:36:44.467063
descriptionParallel iteration of FASTA/FASTQ files, for when sequence order doesn't matter but speed does
homepage
repository
max_upload_size
id681351
size25,355
Rayan Chikhi (rchikhi)

documentation

README

Rust-parallelfastx

A truly parallel parser for FASTA/FASTQ files.

Principle

The input file is memory-mapped then virtually split into N chunks. Each chunk is fed to a regular FASTA/FASTQ parser (here, the excellent https://github.com/markschl/seq_io library).

Rationale

Virtually all other "multithreaded" FASTA/FASTQ parsers typically use only one thread to parse the file, then they feed the parsed sequences to threads. If your disk is fast enough (> 2 GB/s) that parsing the file becomes a CPU bottleneck, then you might benefit from this library as the parsing is truly multithreaded.

How to use

see src/main.rs, should be self explanatory.

Inspiration

Inspiration for this repository is the amazing fastlwc-mt tool from https://github.com/expr-fi/fastlwc which does multi-threaded line counting.

Caveat

Input file needs to be seekable, which rules out all compression methods except blocked ones, which currently aren't supported by this library, but could be in principle.

Author

Rayan Chikhi, 2022

Commit count: 0

cargo fmt