cdc-chunkers

Crates.iocdc-chunkers
lib.rscdc-chunkers
version0.1.3
created_at2025-02-02 19:00:13.893826+00
updated_at2025-02-18 11:52:00.305464+00
descriptionA collection of Content Defined Chunking algorithms
homepage
repositoryhttps://github.com/Piletskii-Oleg/rust-chunking
max_upload_size
id1539805
size68,170
Piletskii Oleg (Piletskii-Oleg)

documentation

README

Crates.io MIT licensed

rust-chunking

Content Based Chunking algorithms implementation:

Simple code to test an algorithm is provided in filetest.rs.

Features

  • Chunkers that work using std::iter::Iterator trait, giving out data about the source dataset in the form of chunks.
  • Chunker sizes can be customized on creation. Default size values are provided.
  • Other parameters from corresponding papers can also be modified on chunker creation.

Usage

To use them in custom code, the algorithms can be accessed using the corresponding modules, e.g.

fn main() {
    let data = vec![1; 1024 * 1024];
    
    let sizes = SizeParams::new(4096, 8192, 16384);
    let chunker = ultra::Chunker::new(&data, sizes);
  
    for chunk in chunker {
        println!("start: {}, length: {}", chunk.pos, chunk.len);
    }
  
    let default_leap = leap_based::Chunker::new(&data, SizeParams::leap_default());
    for chunk in default_leap {
        println!("start: {}, length: {}", chunk.pos, chunk.len);
    }
}
Commit count: 79

cargo fmt