Crates.io | natural |
lib.rs | natural |
version | 0.5.0 |
source | src |
created_at | 2017-03-07 13:14:35.934646 |
updated_at | 2020-02-13 07:02:07.166508 |
description | Pure rust library for natural language processing. |
homepage | |
repository | https://github.com/cjqed/rs-natural |
max_upload_size | |
id | 8874 |
size | 391,328 |
Natural language processing library written in Rust. Still very much a work in progress. Basically an experiment, but hey maybe something cool will come out of it.
Currently working:
Near-sight goals:
Use at your own risk. Some functionality is missing, some other functionality is slow as molasses because it isn't optomized yet. I'm targeting master, and don't offer backward compatibility.
It's a crate with a cargo.toml. Add this to your cargo.toml:
[dependencies]
natural = "0.3.0"
# Or enable Serde support
natural = { version = "0.4.0", features = ["serde_support"]}
serde = "1.0"
extern crate natural;
use natural::distance::jaro_winkler_distance;
use natural::distance::levenshtein_distance;
assert_eq!(levenshtein_distance("kitten", "sitting"), 3);
assert_eq!(jaro_winkler_distance("dixon", "dicksonx"), 0.767);
Note, don't actually assert_eq!
on JWD since it returns an f64. To test, I actually use:
fn f64_eq(a: f32, b: f32) {
assert!((a - b).abs() < 0.01);
}
There are two ways to gain access to the SoundEx algorithm in this library, either through a simple soundex
function that accepts two &str
parameters and returns a boolean, or through the SoundexWord struct. I will show both here.
use natural::phonetics::soundex;
use natural::phonetics::SoundexWord;
assert!(soundex("rupert", "robert"));
let s1 = SoundexWord::new("rupert");
let s2 = SoundexWord::new("robert");
assert!(s1.sounds_like(s2));
assert!(s1.sounds_like_str("robert"));
extern crate natural;
use natural::tokenize::tokenize;
assert_eq!(tokenize("hello, world!"), vec!["hello", "world"]);
assert_eq!(tokenize("My dog has fleas."), vec!["My", "dog", "has", "fleas"]);
You can create an ngram with and without padding, e.g.:
extern crate natural;
use natural::ngram::get_ngram;
use natural::ngram::get_ngram_with_padding;
assert_eq!(get_ngram("hello my darling", 2), vec![vec!["hello", "my"], vec!["my", "darling"]]);
assert_eq!(get_ngram_with_padding("my fleas", 2, "----"), vec![
vec!["----", "my"], vec!["my", "fleas"], vec!["fleas", "----"]]);
extern crate natural;
use natural::classifier::NaiveBayesClassifier;
let mut nbc = NaiveBayesClassifier::new();
nbc.train(STRING_TO_TRAIN, LABEL);
nbc.train(STRING_TO_TRAIN, LABEL);
nbc.train(STRING_TO_TRAIN, LABEL);
nbc.train(STRING_TO_TRAIN, LABEL);
nbc.guess(STRING_TO_GUESS); //returns a label with the highest probability
extern crate natural;
use natural::tf_idf::TfIdf;
tf_idf.add("this document is about rust.");
tf_idf.add("this document is about erlang.");
tf_idf.add("this document is about erlang and rust.");
tf_idf.add("this document is about rust. it has rust examples");
println!(tf_idf.get("rust")); //0.2993708f32
println!(tf_idf.get("erlang")); //0.13782766f32
//average of multiple terms
println!(tf_idf.get("rust erlang"); //0.21859923