overlap-chunk

Crates.iooverlap-chunk
lib.rsoverlap-chunk
version
sourcesrc
created_at2025-02-26 21:52:25.770484+00
updated_at2025-03-03 21:19:34.18823+00
descriptionA Rust library for splitting text into chunks of specified size with adjustable overlap percentage.
homepage
repositoryhttps://github.com/katsuhirohonda/overlap-chunk
max_upload_size
id1570974
Cargo.toml error:TOML parse error at line 18, column 1 | 18 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include`
size0
Katsuhiro Honda (katsuhirohonda)

documentation

README

overlap-chunk

A Rust library for splitting text into chunks of specified size with adjustable overlap percentage.

Features

Current Features

  • Basic functionality to split text into chunks of specified size
  • Option to adjust the overlap percentage between chunks
  • Command-line interface for easy text processing

Future Features

  • Chunking that respects word boundaries and sentence boundaries
  • Support for multilingual text
  • Support for streaming input

Library Usage

use overlap_chunk::ChunkOptions;
use overlap_chunk::chunk_text;

fn main() {
    let text = "This is a test text. We will split this long text into smaller chunks.";
    
    // Chunk splitting with default options (no overlap)
    let chunks = chunk_text(text, 10, None);
    println!("{:?}", chunks);
    
    // Chunk splitting with overlap (50% overlap)
    let options = ChunkOptions {
        overlap_percentage: 50,
        ..Default::default()
    };
    let chunks_with_overlap = chunk_text(text, 10, Some(options));
    println!("{:?}", chunks_with_overlap);
}

Command Line Usage

The library includes a command-line interface for processing text files:

Usage: overlap-chunk [OPTIONS] [FILE]
  If no file is specified, read from standard input

Options:
  -h, --help              Display this help message
  -s, --size SIZE         Specify chunk size (default: 100)
  -o, --overlap PERCENT   Specify overlap percentage between 0 and 90 (default: 0)

Examples

Process a file with default settings:

overlap-chunk myfile.txt

Process a file with custom chunk size and overlap:

overlap-chunk -s 50 -o 30 myfile.txt

Process standard input:

cat myfile.txt | overlap-chunk -s 50

License

MIT License

Commit count: 0

cargo fmt