prseq

Crates.ioprseq
lib.rsprseq
version0.0.33
created_at2025-10-07 10:41:59.735084+00
updated_at2025-11-23 16:14:01.026846+00
descriptionRust tools (with Python bindings) for sequence analysis
homepage
repositoryhttps://github.com/virologyCharite/prseq
max_upload_size
id1871487
size34,996
Terry Jones (terrycojones)

documentation

README

prseq (Rust)

High-performance Rust library for FASTA and FASTQ sequence parsing.

Crates.io Rust Tests License: MIT

Overview

prseq is a Rust library providing fast, memory-efficient parsers for FASTA and FASTQ sequence formats. It features:

  • High Performance: Zero-copy parsing where possible with optimized buffered I/O
  • Streaming Iterators: Process files larger than available RAM
  • Automatic Compression: Built-in support for gzip and bzip2
  • Flexible Input: Works with files, stdin, or any Read trait
  • Format Support: Full FASTA and FASTQ with multi-line sequences

This library also powers the Python prseq package, which provides Python bindings and CLI tools.

Installation

Add to your Cargo.toml:

[dependencies]
prseq = "0.0.6"

Rust API Reference

FASTA Parsing

use prseq::fasta::{FastaReader, FastaRecord, read_fasta};
use std::fs::File;

// Read all records into memory
let records = read_fasta("sequences.fasta")?;
for record in records {
    println!("{}: {} bp", record.id, record.sequence.len());
}

// Stream records (memory efficient)
let mut reader = FastaReader::from_file("large.fasta")?;
for result in reader {
    let record = result?;
    if record.sequence.len() > 1000 {
        println!("Long sequence: {}", record.id);
    }
}

// Read from stdin
let mut reader = FastaReader::from_stdin()?;
for result in reader {
    let record = result?;
    println!("Read: {}", record.id);
}

// Performance tuning
let mut reader = FastaReader::from_file_with_capacity("file.fasta", 50000)?;

// Works with any Read trait
let file = File::open("sequences.fasta")?;
let mut reader = FastaReader::from_reader_with_capacity(file, 8192)?;

FASTQ Parsing

use prseq::fastq::{FastqReader, FastqRecord, read_fastq};
use std::fs::File;

// Read all records into memory
let records = read_fastq("reads.fastq")?;
for record in records {
    println!("{}: {} bp, quality: {}",
             record.id, record.sequence.len(), record.quality.len());
}

// Stream records (memory efficient)
let mut reader = FastqReader::from_file("large.fastq")?;
for result in reader {
    let record = result?;
    // Quality and sequence lengths are automatically validated
    assert_eq!(record.sequence.len(), record.quality.len());
}

// Read from stdin
let mut reader = FastqReader::from_stdin()?;

// Performance tuning for different read lengths
let mut reader = FastqReader::from_file_with_capacity("reads.fastq", 150)?; // Short reads
let mut reader = FastqReader::from_file_with_capacity("nanopore.fastq", 10000)?; // Long reads

// Works with any Read trait (including compressed streams)
use flate2::read::GzDecoder;
let file = File::open("reads.fastq.gz")?;
let decoder = GzDecoder::new(file);
let mut reader = FastqReader::from_reader_with_capacity(decoder, 1024)?;

Development

Building

cd rust
cargo build --release

Testing

cd rust
cargo test

Publishing

cd rust
cargo publish

Format Support

FASTA Format

  • Header lines starting with >
  • Multi-line sequences (automatic concatenation)
  • Empty lines ignored
  • Compression: gzip (.gz), bzip2 (.bz2)

FASTQ Format

  • 4-line format: @header, sequence, +[optional_header], quality
  • Multi-line sequences and quality scores
  • Optional header validation on + line
  • Automatic sequence/quality length validation
  • Compression: gzip (.gz), bzip2 (.bz2)

Python Bindings

For Python users, see the Python prseq package which provides:

  • Pythonic API with full type hints
  • Command-line tools (fasta-info, fastq-stats, etc.)
  • Easy installation via pip/uv

Links

License

This project is licensed under the MIT License - see the LICENSE file for details.

Commit count: 0

cargo fmt