waken_snowball

Crates.iowaken_snowball
lib.rswaken_snowball
version0.1.0
created_at2025-07-24 03:02:57.790175+00
updated_at2025-07-24 03:02:57.790175+00
descriptionRust implementation of Snowball stemming algorithms for 33 languages
homepagehttps://snowballstem.org/
repositoryhttps://github.com/snowballstem/snowball
max_upload_size
id1765473
size900,835
waken (mc373906408)

documentation

https://docs.rs/waken_snowball

README

Snowball Stemmer for Rust

Crates.io Documentation License

A Rust implementation of the Snowball stemming algorithms. This library provides stemming functionality for 33 languages, generated directly from the official Snowball compiler.

Features

  • 🌍 33 Languages Supported: Arabic, Armenian, Basque, Catalan, Danish, Dutch, English, French, German, and many more
  • 🚀 High Performance: Compiled Rust code with zero-cost abstractions
  • 🔒 Memory Safe: Pure Rust implementation with no unsafe code
  • 📦 Easy to Use: Simple API with both functional and object-oriented interfaces
  • Well Tested: Comprehensive test suite ensuring correctness

Supported Languages

Arabic, Armenian, Basque, Catalan, Danish, Dutch, Dutch Porter, English, Esperanto, Estonian, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Lovins, Nepali, Norwegian, Porter, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Tamil, Turkish, Yiddish.

Installation

Add this to your Cargo.toml:

[dependencies]
waken_snowball = "0.1.0"

Quick Start

use waken_snowball::{Algorithm, stem};

fn main() {
    // Stem a single word
    let stemmed = stem(Algorithm::English, "running");
    assert_eq!(stemmed, "run");
    
    // Create a reusable stemmer
    let stemmer = Algorithm::English.stemmer();
    let result = stemmer.stem("jumping");
    assert_eq!(result, "jump");
    
    // Use different languages
    assert_eq!(stem(Algorithm::French, "finalement"), "final");
    assert_eq!(stem(Algorithm::German, "entwicklung"), "entwickl");
    assert_eq!(stem(Algorithm::Spanish, "programación"), "program");
}

API Reference

Functions

stem(algorithm: Algorithm, word: &str) -> Cow<str>

Stem a single word using the specified algorithm.

Parameters:

  • algorithm: The stemming algorithm to use
  • word: The word to stem (should be lowercase for best results)

Returns: The stemmed word. Returns a borrowed reference if unchanged, owned String if modified.

algorithms() -> &'static [Algorithm]

Get a list of all supported algorithms.

Types

Algorithm

An enum representing all supported stemming algorithms:

pub enum Algorithm {
    Arabic, Armenian, Basque, Catalan, Danish, Dutch, DutchPorter,
    English, Esperanto, Estonian, Finnish, French, German, Greek,
    Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian,
    Lovins, Nepali, Norwegian, Porter, Portuguese, Romanian,
    Russian, Serbian, Spanish, Swedish, Tamil, Turkish, Yiddish,
}

Methods:

  • as_str() -> &'static str: Get the string name of the algorithm
  • stemmer() -> Stemmer: Create a reusable stemmer for this algorithm
  • from_str(s: &str) -> Option<Algorithm>: Parse algorithm from string name

Stemmer

A reusable stemmer for a specific algorithm:

impl Stemmer {
    pub fn new(algorithm: Algorithm) -> Self;
    pub fn stem<'a>(&self, word: &'a str) -> Cow<'a, str>;
}

Examples

Basic Usage

use snowball_stemmer::{Algorithm, stem};

// English stemming
let words = vec!["running", "jumped", "easily"];
for word in words {
    println!("{} -> {}", word, stem(Algorithm::English, word));
}
// Output:
// running -> run
// jumped -> jump
// easily -> easili

Multiple Languages

use snowball_stemmer::{Algorithm, stem};

let examples = vec![
    (Algorithm::English, "connection", "connect"),
    (Algorithm::French, "développement", "développ"),
    (Algorithm::German, "programmierung", "programmier"),
    (Algorithm::Spanish, "programación", "program"),
];

for (algorithm, word, expected) in examples {
    let result = stem(algorithm, word);
    assert_eq!(result, expected);
    println!("{}: {} -> {}", algorithm.as_str(), word, result);
}

Reusable Stemmer

use snowball_stemmer::Algorithm;

let stemmer = Algorithm::English.stemmer();
let words = vec!["testing", "stemmer", "functionality"];

for word in words {
    println!("{} -> {}", word, stemmer.stem(word));
}

Dynamic Algorithm Selection

use snowball_stemmer::Algorithm;

let language = "portuguese";
if let Some(algorithm) = Algorithm::from_str(language) {
    let result = algorithm.stemmer().stem("programação");
    println!("{}: programação -> {}", language, result);
}

Performance

This library is designed for high performance:

  • Zero-cost abstractions: The Rust implementation has minimal overhead
  • Compiled algorithms: All stemming rules are compiled into efficient Rust code
  • Memory efficient: Uses Cow<str> to avoid unnecessary allocations when words are unchanged

Accuracy

The algorithms are generated directly from the official Snowball compiler, ensuring:

  • 100% compatibility with reference implementations
  • Identical results to other Snowball ports (Python, Java, C, etc.)
  • Regular updates when upstream algorithms are improved

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Links

Commit count: 0

cargo fmt