Crates.io | ngram-search |
lib.rs | ngram-search |
version | 0.1.1 |
source | src |
created_at | 2020-09-01 21:11:24.400906 |
updated_at | 2020-09-01 21:24:10.748124 |
description | Ngram-based indexing of strings into a binary file |
homepage | |
repository | https://gitlab.com/remram44/ngram-search |
max_upload_size | |
id | 283638 |
size | 63,792 |
This crate allows indexing many strings into a file, and then efficiently fuzzy-matching strings against what's been indexed.
Currently, the structure is built in memory before being written to the file, so that phase uses a lot of RAM.
String search is done from the file and requires little memory.
The index is a trie structure in which trigrams can be looked up; results for each trigrams of the input are matched and sorted to get the most similar strings.
Example:
// Build index
let mut builder = Ngrams::builder();
builder.add("spam", 0);
builder.add("ham", 1);
builder.add("mam", 2);
// Write it to a file
let mut file = BufWriter::new(File::create(path).unwrap());
builder.write(&mut file).unwrap();
// Search our index
let mut data = Ngrams::open(path).unwrap();
assert_eq!(
data.search("ham", 0.24).unwrap(),
vec![
(1, 1.0), // "ham" is an exact match
(2, 0.25), // "mam" is close
],
);
assert_eq!(
data.search("spa", 0.2).unwrap(),
vec![
(0, 0.22222222), // "spam" is close
],
);