ngram_rs

Crates.iongram_rs
lib.rsngram_rs
version0.1.1
created_at2025-11-13 22:02:49.995267+00
updated_at2025-11-13 22:02:49.995267+00
descriptionFacilitate creating ngrams in Rust to be used in the polars plugin.
homepagehttps://github.com/ericqu/ngram-rs
repositoryhttps://github.com/ericqu/ngram-rs
max_upload_size
id1931938
size13,678
(ericqu)

documentation

README

N-Gram Generation Toolkit

A high-performance n-gram generation library with Rust core and Polars plugin integration.

Features

  • Blazing Fast: Optimized Rust implementation for n-gram generation
  • Memory Efficient: Uses Cow (Copy-on-Write) for minimal allocations
  • Flexible N-Ranges: Generate n-grams for multiple values of n simultaneously
  • Custom Delimiters: Support for any string delimiter between tokens
  • Polars Integration: Seamless integration with Polars DataFrames
  • Iterator Support: Lazy n-gram generation for memory-constrained environments

Components

ngram_rs (Core Library)

The core Rust library providing:

  • Three different APIs for various use cases
  • Optimized implementations for common cases (unigrams, bigrams)
  • Iterator-based lazy generation

Quick Start

use ngram_rs::generate_ngrams;

let words = vec!["the", "quick", "brown", "fox"]
    .into_iter()
    .map(String::from)
    .collect::<Vec<_>>();

let ngrams = generate_ngrams(&words, &[1, 2, 3], Some(" "));

Performance

The library is optimized for:

  • Minimal memory allocations through Cow
  • Specialized implementations for unigrams and bigrams
  • Efficient windowing algorithms for higher-order n-grams
  • Zero-copy operations where possible
Commit count: 0

cargo fmt