parallel_bzip2_decoder

Crates.ioparallel_bzip2_decoder
lib.rsparallel_bzip2_decoder
version0.2.0
created_at2025-12-06 16:38:44.671504+00
updated_at2025-12-11 19:39:07.068942+00
descriptionHigh-performance parallel bzip2 decompression library
homepagehttps://github.com/parallel-bz2/parallel-bz2
repositoryhttps://github.com/parallel-bz2/parallel-bz2
max_upload_size
id1970399
size108,318
Gautier Portet (kassoulet)

documentation

https://docs.rs/parallel_bzip2_decoder

README

parallel_bzip2_decoder

A high-performance, parallel bzip2 decoder for Rust.

This crate provides a Bz2Decoder that implements std::io::Read, allowing you to decompress bzip2 files in parallel using multiple CPU cores. It is designed to work efficiently with both single-stream (standard) and multi-stream (e.g., pbzip2) bzip2 files by scanning for block boundaries and decompressing them concurrently.

Features

  • Parallel Decompression: Utilizes rayon to decompress blocks in parallel.
  • Standard API: Implements std::io::Read for easy integration.
  • Memory Mapped: Efficiently handles large files using memory mapping.
  • Flexible: Supports opening files directly or working with in-memory buffers (via Arc).
  • Full bzip2 format support: Handles both single-stream and multi-stream bzip2 files
  • Error handling: Comprehensive error reporting with anyhow integration
  • Memory efficient: Bounded channels and buffer reuse to minimize memory usage

Usage

Add this to your Cargo.toml:

[dependencies]
parallel_bzip2_decoder = "0.1"

Decompressing a File

The easiest way to use parallel_bzip2_decoder is to use Bz2Decoder::open, which handles memory mapping internally:

use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;

fn main() -> anyhow::Result<()> {
    let mut decoder = Bz2Decoder::open("input.bz2")?;
    let mut buffer = Vec::new();
    decoder.read_to_end(&mut buffer)?;
    println!("Decompressed {} bytes", buffer.len());
    Ok(())
}

Decompressing from Memory

If you already have the data in memory (e.g., an Arc<[u8]> or Arc<Mmap>), you can use Bz2Decoder::new:

use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;
use std::sync::Arc;

fn main() -> anyhow::Result<()> {
    let data: Vec<u8> = vec![/* ... bzip2 data ... */];
    let data_arc = Arc::new(data);
    let mut decoder = Bz2Decoder::new(data_arc);

    let mut buffer = Vec::new();
    decoder.read_to_end(&mut buffer)?;
    Ok(())
}

Performance

parallel_bzip2_decoder scales linearly with the number of available CPU cores. It is significantly faster than standard single-threaded decoders for large files.

Benchmarking and Profiling

This crate includes comprehensive benchmarks and profiling tools:

  • Decode benchmarks: Test decompression with various file sizes (1MB, 10MB, 50MB)
  • Scanner benchmarks: Measure block scanning performance
  • End-to-end benchmarks: Test the full decompression pipeline
  • CPU profiling: Generate flamegraphs to identify performance bottlenecks
  • Memory profiling: Track memory usage and detect leaks

Running Benchmarks

# Run all benchmarks
cargo bench

# Run specific benchmark suite
cargo bench --bench decode_benchmark
cargo bench --bench scanner_benchmark
cargo bench --bench e2e_benchmark

Profiling

# CPU profiling with flamegraphs
cd ../scripts
./profile_cpu.sh

# Memory profiling with valgrind
./profile_memory.sh

For detailed instructions, see BENCHMARKING.md.

API Stability

This crate follows semantic versioning. Breaking changes will only occur with major version updates.

License

MIT

Contributing

See the main repository's CONTRIBUTING.md for details on how to contribute.

Changelog

See CHANGELOG.md for a history of changes (when available).

Commit count: 0

cargo fmt