| Crates.io | waken_snowball |
| lib.rs | waken_snowball |
| version | 0.1.0 |
| created_at | 2025-07-24 03:02:57.790175+00 |
| updated_at | 2025-07-24 03:02:57.790175+00 |
| description | Rust implementation of Snowball stemming algorithms for 33 languages |
| homepage | https://snowballstem.org/ |
| repository | https://github.com/snowballstem/snowball |
| max_upload_size | |
| id | 1765473 |
| size | 900,835 |
A Rust implementation of the Snowball stemming algorithms. This library provides stemming functionality for 33 languages, generated directly from the official Snowball compiler.
Arabic, Armenian, Basque, Catalan, Danish, Dutch, Dutch Porter, English, Esperanto, Estonian, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Lovins, Nepali, Norwegian, Porter, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Tamil, Turkish, Yiddish.
Add this to your Cargo.toml:
[dependencies]
waken_snowball = "0.1.0"
use waken_snowball::{Algorithm, stem};
fn main() {
// Stem a single word
let stemmed = stem(Algorithm::English, "running");
assert_eq!(stemmed, "run");
// Create a reusable stemmer
let stemmer = Algorithm::English.stemmer();
let result = stemmer.stem("jumping");
assert_eq!(result, "jump");
// Use different languages
assert_eq!(stem(Algorithm::French, "finalement"), "final");
assert_eq!(stem(Algorithm::German, "entwicklung"), "entwickl");
assert_eq!(stem(Algorithm::Spanish, "programación"), "program");
}
stem(algorithm: Algorithm, word: &str) -> Cow<str>Stem a single word using the specified algorithm.
Parameters:
algorithm: The stemming algorithm to useword: The word to stem (should be lowercase for best results)Returns: The stemmed word. Returns a borrowed reference if unchanged, owned String if modified.
algorithms() -> &'static [Algorithm]Get a list of all supported algorithms.
AlgorithmAn enum representing all supported stemming algorithms:
pub enum Algorithm {
Arabic, Armenian, Basque, Catalan, Danish, Dutch, DutchPorter,
English, Esperanto, Estonian, Finnish, French, German, Greek,
Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian,
Lovins, Nepali, Norwegian, Porter, Portuguese, Romanian,
Russian, Serbian, Spanish, Swedish, Tamil, Turkish, Yiddish,
}
Methods:
as_str() -> &'static str: Get the string name of the algorithmstemmer() -> Stemmer: Create a reusable stemmer for this algorithmfrom_str(s: &str) -> Option<Algorithm>: Parse algorithm from string nameStemmerA reusable stemmer for a specific algorithm:
impl Stemmer {
pub fn new(algorithm: Algorithm) -> Self;
pub fn stem<'a>(&self, word: &'a str) -> Cow<'a, str>;
}
use snowball_stemmer::{Algorithm, stem};
// English stemming
let words = vec!["running", "jumped", "easily"];
for word in words {
println!("{} -> {}", word, stem(Algorithm::English, word));
}
// Output:
// running -> run
// jumped -> jump
// easily -> easili
use snowball_stemmer::{Algorithm, stem};
let examples = vec![
(Algorithm::English, "connection", "connect"),
(Algorithm::French, "développement", "développ"),
(Algorithm::German, "programmierung", "programmier"),
(Algorithm::Spanish, "programación", "program"),
];
for (algorithm, word, expected) in examples {
let result = stem(algorithm, word);
assert_eq!(result, expected);
println!("{}: {} -> {}", algorithm.as_str(), word, result);
}
use snowball_stemmer::Algorithm;
let stemmer = Algorithm::English.stemmer();
let words = vec!["testing", "stemmer", "functionality"];
for word in words {
println!("{} -> {}", word, stemmer.stem(word));
}
use snowball_stemmer::Algorithm;
let language = "portuguese";
if let Some(algorithm) = Algorithm::from_str(language) {
let result = algorithm.stemmer().stem("programação");
println!("{}: programação -> {}", language, result);
}
This library is designed for high performance:
Cow<str> to avoid unnecessary allocations when words are unchangedThe algorithms are generated directly from the official Snowball compiler, ensuring:
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.