| Crates.io | seq_chunking |
| lib.rs | seq_chunking |
| version | 0.1.0 |
| created_at | 2025-06-14 14:19:54.10859+00 |
| updated_at | 2025-06-14 14:19:54.10859+00 |
| description | SeqCDC (content defined chunking) in pure Rust. |
| homepage | |
| repository | https://github.com/puntakana/seqcdc-rs |
| max_upload_size | |
| id | 1712386 |
| size | 71,679 |
A Rust library for sequence-based data chunking using slope detection algorithms.
This library provides efficient algorithms for dividing data streams into chunks based on byte sequence patterns (increasing or decreasing slopes). It's particularly useful for content-defined chunking applications, data deduplication, and stream processing.
Add this to your Cargo.toml:
[dependencies]
seq-chunking = "0.1.0"
use seq_chunking::{SeqChunking, ChunkingConfig, SeqOpMode};
// Create a chunker with default settings
let chunker = SeqChunking::new();
// Chunk some data
let data = b"your data here";
let chunks: Vec<_> = chunker.chunk_all(data).collect();
// Verify integrity
for chunk in &chunks {
println!("Chunk: {} bytes at position {}", chunk.len, chunk.start);
}
use seq_chunking::{SeqChunking, ChunkingConfig, SeqOpMode};
// Build a custom configuration
let config = ChunkingConfig::builder()
.seq_threshold(10) // Longer sequences needed
.min_block_size(2048) // 2KB minimum chunks
.max_block_size(32768) // 32KB maximum chunks
.op_mode(SeqOpMode::Decreasing) // Look for decreasing sequences
.jump_trigger(100) // Jump after 100 opposing slopes
.build()
.expect("Invalid configuration");
let chunker = SeqChunking::from_config(config);
let chunks: Vec<_> = chunker.chunk_all(data).collect();
use seq_chunking::{SeqChunking, utils::FileUtils};
// Read a file and chunk it
let data = FileUtils::read_file("input.dat")?;
let chunker = SeqChunking::new();
let chunks: Vec<_> = chunker.chunk_all(&data).collect();
// Write chunks back to a file
FileUtils::write_chunks_to_file("output.dat", &chunks)?;
The SeqChunking algorithm works by:
seq_threshold: Number of consecutive sequence bytes needed to trigger a cutmin_block_size: Minimum chunk size in bytesmax_block_size: Maximum chunk size in bytesjump_trigger: Number of opposing slopes before jumping aheadjump_size: Number of bytes to skip when jumpingThe library is designed for high performance with several optimizations:
Typical performance on modern hardware:
SeqChunking: Main chunking algorithm implementationChunkingConfig: Configuration parameters for the algorithmChunk: Represents a single chunk with data and position informationChunkIterator: Iterator for streaming through chunksutils::FileUtils: File I/O operationsutils::ValidationUtils: Data integrity verificationutils::TestDataGenerator: Generate test data with specific patternsutils::PerfUtils: Performance measurement utilitiesThe library includes several examples demonstrating different use cases:
# Basic usage example
cargo run --example basic_usage
# File processing example
cargo run --example file_processing
Run the test suite:
# Run all tests
cargo test
# Run tests with output
cargo test -- --nocapture
# Run benchmarks
cargo bench
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under either of MIT license