Crates.io | fast_symspell |
lib.rs | fast_symspell |
version | 0.1.7 |
source | src |
created_at | 2021-03-08 07:27:06.556588 |
updated_at | 2021-12-04 09:07:38.770345 |
description | Spelling correction & Fuzzy search |
homepage | |
repository | |
max_upload_size | |
id | 365574 |
size | 74,500 |
Some modifications to the Rust implementation by René Klačan of brilliant SymSpell originally written in C# by @wolfgarbe. No changes to the API were made, use this as a faster drop-in replacement for the symspell crate.
extern crate symspell;
use symspell::{AsciiStringStrategy, SymSpell, Verbosity};
fn main() {
let mut symspell: SymSpell<AsciiStringStrategy> = SymSpell::default();
symspell.load_dictionary("data/frequency_dictionary_en_82_765.txt", 0, 1, " ");
symspell.load_bigram_dictionary(
"./data/frequency_bigramdictionary_en_243_342.txt",
0,
2,
" "
);
let suggestions = symspell.lookup("roket", Verbosity::Top, 2);
println!("{:?}", suggestions);
let sentence = "whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him"
let compound_suggestions = symspell.lookup_compound(sentence, 2);
println!("{:?}", compound_suggestions);
let sentence = "whereisthelove";
let segmented = symspell.word_segmentation(sentence, 2);
println!("{:?}", segmented);
}
N.B. the dictionary entries have to be lowercase
let mut symspell: SymSpell<AsciiStringStrategy> = SymSpellBuilder::default()
.max_dictionary_edit_distance(2)
.prefix_length(7)
.count_threshold(1)
.build()
.unwrap()
String strategy is abstraction for string manipulation, for example preprocessing.
There are two strategies included:
UnicodeStringStrategy
AsciiStringStrategy
To configure string strategy just pass it as a type parameter:
let mut ascii_symspell: SymSpell<AsciiStringStrategy> = SymSpell::default();
let mut unicode_symspell: SymSpell<UnicodeStringStrategy> = SymSpell::default();
This crate can be compiled against wasm32 target and exposes a SymSpell Class that can be used from Javascript as follow.
Only UnicodeStringStrategy
is exported, meaning that if someone wants to manipulate ASCII only strings the dictionary and the sentences must be prepared in advance from JS.
const fs = require('fs');
const rust = require('./pkg');
let dictionary = fs.readFileSync('data/frequency_dictionary_en_82_765.txt');
let sentence = "whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him";
let symspell = new rust.SymSpell({ max_edit_distance: 2, prefix_length: 7, count_threshold: 1});
symspell.load_dictionary(dictionary.buffer, { term_index: 0, count_index: 1, separator: " "});
symspell.load_bigram_dictionary(bigram_dict.buffer, { term_index: 0, count_index: 2, separator: " "});
symspell.lookup_compound(sentence, 1);
It can be compiled using wasm-pack
(eg. wasm-pack build --release --target nodejs
)