| Crates.io | ibu |
| lib.rs | ibu |
| version | 0.2.1 |
| created_at | 2024-12-04 18:40:23.257604+00 |
| updated_at | 2025-12-03 21:41:25.712536+00 |
| description | A library for high throughput binary encoding genomic sequences |
| homepage | |
| repository | https://github.com/noamteyssier/ibu |
| max_upload_size | |
| id | 1472397 |
| size | 159,856 |
ibu is a Rust library for efficiently handling binary-encoding barcode, UMI, and index data in
high-throughput genomics applications.
It is designed to be fast, memory-efficient, and easy to use.
It is heavily inspired and even more minimal than the BUS binary format.
The binary format consists of a header followed by a collection of records.
The header is strictly defined in the following 32 bytes:
| Field | Type | Description |
|---|---|---|
| Magic | u32 |
File type identifier: 0x21554249 ("IBU!") |
| Version | u32 |
The version of the binary format (currently 2) |
| Barcode Length | u32 |
The length of the barcode field in bases (MAX = 32) |
| UMI Length | u32 |
The length of the UMI field in bases (MAX = 32) |
| Flags | u64 |
Bit flags (bit 0: sorted, rest reserved for future use) |
| Record Count | u64 |
Total number of records (0 if unknown) |
| Reserved | [u8; 8] |
Reserved bytes for future extensions |
The record is strictly defined in the following 24 bytes:
| Field | Type | Description |
|---|---|---|
| Barcode | u64 |
The barcode represented with 2bit encoding |
| UMI | u64 |
The UMI represented with 2bit encoding |
| Index | u64 |
A numerical index (abstract application specific usage for users) |
Importantly, the barcode and UMI fields are encoded with 2bit encoding, which means that the maximum barcode and UMI lengths are 32 bases.
For 2bit {en,de}coding in rust feel free to check out bitnuc.
Users may choose to encode their own data into the index field or use it for other purposes.
The library provides detailed error handling through the IbuError enum, covering:
use ibu::{Header, Reader, Record, Writer};
use std::io::Cursor;
// Create a header for 16-base barcodes and 12-base UMIs
let mut header = Header::new(16, 12);
header.set_sorted(); // Mark as sorted if needed
// Create some records
let records = vec![
Record::new(0x00001100, 0x100011, 0),
Record::new(0x00001101, 0x100010, 1),
];
// Write to a buffer
let buffer = Vec::new();
let mut writer = Writer::new(buffer, header)?;
writer.write_batch(&records)?;
writer.finish()?;
// Get the written buffer
let buffer = writer.into_inner();
// The expected buffer should be 32 (header) + 24 * 2 (records) = 80 bytes
assert_eq!(buffer.len(), 80);
// Read from buffer
let cursor = Cursor::new(buffer);
let reader = Reader::new(cursor)?;
// Access the header
let header = reader.header();
assert_eq!(header.bc_len, 16);
assert_eq!(header.umi_len, 12);
// Read the records
let mut read_records = Vec::new();
for record in reader {
read_records.push(record?);
}
assert_eq!(records, read_records);
For high-performance applications, ibu provides memory-mapped file reading with built-in parallel processing support:
use ibu::{MmapReader, ParallelProcessor, ParallelReader, Record};
use std::sync::{Arc, Mutex};
// Define a custom processor
#[derive(Clone, Default)]
struct MyProcessor {
local_count: u64,
global_count: Arc<Mutex<u64>>,
}
impl ParallelProcessor for MyProcessor {
fn process_record(&mut self, record: Record) -> ibu::Result<()> {
self.local_count += 1;
Ok(())
}
fn on_batch_complete(&mut self) -> ibu::Result<()> {
let mut guard = self.global_count.lock().unwrap();
*guard += self.local_count;
self.local_count = 0;
Ok(())
}
}
// Use memory-mapped reader with parallel processing
let reader = MmapReader::new("data.ibu")?;
let processor = MyProcessor::default();
reader.process_parallel(processor, 0)?; // 0 = use all available cores
Load entire files directly into memory:
use ibu::load_to_vec;
let (header, records) = load_to_vec("data.ibu")?;
println!("Loaded {} records", records.len());
When the niffler feature is enabled (default), ibu automatically handles gzip and zstd compression:
// Automatically detects and decompresses
let reader = Reader::from_path("data.ibu.gz")?;
ibu is designed for high-throughput applications:
bytemuckTypical performance on modern hardware:
Contributions are welcome! Feel free to open an issue or submit a pull request.
This project is licensed under the MIT License.