Crates.io | trie-rs |
lib.rs | trie-rs |
version | 0.4.2 |
source | src |
created_at | 2019-05-03 03:05:26.101544 |
updated_at | 2024-05-12 19:49:30.983444 |
description | Memory efficient trie (prefix tree) and map library based on LOUDS |
homepage | https://github.com/laysakura/trie-rs |
repository | https://github.com/laysakura/trie-rs |
max_upload_size | |
id | 131651 |
size | 3,836,818 |
Memory efficient trie (prefix tree) and map library based on LOUDS.
Master API Docs | Released API Docs | Benchmark Results | Changelog
To use trie-rs, add the following to your Cargo.toml
file:
[dependencies]
trie-rs = "0.4.2"
use std::str;
use trie_rs::TrieBuilder;
let mut builder = TrieBuilder::new(); // Inferred `TrieBuilder<u8>` automatically
builder.push("すし");
builder.push("すしや");
builder.push("すしだね");
builder.push("すしづめ");
builder.push("すしめし");
builder.push("すしをにぎる");
builder.push("すし"); // Word `push`ed twice is just ignored.
builder.push("🍣");
let trie = builder.build();
// exact_match(): Find a word exactly match to query.
assert_eq!(trie.exact_match("すし"), true);
assert_eq!(trie.exact_match("🍣"), true);
assert_eq!(trie.exact_match("🍜"), false);
// predictive_search(): Find words which include `query` as their prefix.
let results_in_u8s: Vec<Vec<u8>> = trie.predictive_search("すし").collect();
let results_in_str: Vec<String> = trie.predictive_search("すし").collect();
assert_eq!(
results_in_str,
vec![
"すし",
"すしだね",
"すしづめ",
"すしめし",
"すしや",
"すしをにぎる"
] // Sorted by `Vec<u8>`'s order
);
// common_prefix_search(): Find words which is included in `query`'s prefix.
let results_in_u8s: Vec<Vec<u8>> = trie.common_prefix_search("すしや").collect();
let results_in_str: Vec<String> = trie.common_prefix_search("すしや").collect();
assert_eq!(
results_in_str,
vec![
"すし",
"すしや",
] // Sorted by `Vec<u8>`'s order
);
TrieBuilder
is implemented using generic type like following:
impl<Label: Ord> TrieBuilder<Label> {
...
pub fn push<Arr: AsRef<[Label]>>(&mut self, word: Arr) where Label: Clone { ... }
...
}
In the above Usage Overview
example, we used Label=u8, Arr=&str
. If
Label
does not implement Clone
, use
[insert()
][crate::trie::TrieBuilder::insert].
Here shows other Label
and Arr
type examples.
Label=&str, Arr=Vec<&str>
Say Label
is English words and Arr
is English phrases.
use trie_rs::TrieBuilder;
let mut builder = TrieBuilder::new();
builder.push(vec!["a", "woman"]);
builder.push(vec!["a", "woman", "on", "the", "beach"]);
builder.push(vec!["a", "woman", "on", "the", "run"]);
let trie = builder.build();
assert_eq!(
trie.exact_match(vec!["a", "woman", "on", "the", "beach"]),
true
);
let r: Vec<Vec<&str>> = trie.predictive_search(vec!["a", "woman", "on"]).collect();
assert_eq!(
r,
vec![
["a", "woman", "on", "the", "beach"],
["a", "woman", "on", "the", "run"],
],
);
let s: Vec<Vec<&str>> = trie.common_prefix_search(vec!["a", "woman", "on", "the", "beach"]).collect();
assert_eq!(
s,
vec![vec!["a", "woman"], vec!["a", "woman", "on", "the", "beach"]],
);
Label=u8, Arr=[u8; n]
Say Label
is a digit in Pi (= 3.14...) and Arr is a window to separate pi's digit by 10.
use trie_rs::TrieBuilder;
let mut builder = TrieBuilder::<u8>::new(); // Pi = 3.14...
builder.push([1, 4, 1, 5, 9, 2, 6, 5, 3, 5]);
builder.push([8, 9, 7, 9, 3, 2, 3, 8, 4, 6]);
builder.push([2, 6, 4, 3, 3, 8, 3, 2, 7, 9]);
builder.push([6, 9, 3, 9, 9, 3, 7, 5, 1, 0]);
builder.push([5, 8, 2, 0, 9, 7, 4, 9, 4, 4]);
builder.push([5, 9, 2, 3, 0, 7, 8, 1, 6, 4]);
builder.push([0, 6, 2, 8, 6, 2, 0, 8, 9, 9]);
builder.push([8, 6, 2, 8, 0, 3, 4, 8, 2, 5]);
builder.push([3, 4, 2, 1, 1, 7, 0, 6, 7, 9]);
builder.push([8, 2, 1, 4, 8, 0, 8, 6, 5, 1]);
builder.push([3, 2, 8, 2, 3, 0, 6, 6, 4, 7]);
builder.push([0, 9, 3, 8, 4, 4, 6, 0, 9, 5]);
builder.push([5, 0, 5, 8, 2, 2, 3, 1, 7, 2]);
builder.push([5, 3, 5, 9, 4, 0, 8, 1, 2, 8]);
let trie = builder.build();
assert_eq!(trie.exact_match([5, 3, 5, 9, 4, 0, 8, 1, 2, 8]), true);
let t: Vec<Vec<u8>> = trie.predictive_search([3]).collect();
assert_eq!(
t,
vec![
[3, 2, 8, 2, 3, 0, 6, 6, 4, 7],
[3, 4, 2, 1, 1, 7, 0, 6, 7, 9],
],
);
let u: Vec<Vec<u8>> = trie.common_prefix_search([1, 4, 1, 5, 9, 2, 6, 5, 3, 5]).collect();
assert_eq!(
u,
vec![[1, 4, 1, 5, 9, 2, 6, 5, 3, 5]],
);
To store a value with each word, use trie_rs::map::{Trie, TrieBuilder}
.
use std::str;
use trie_rs::map::TrieBuilder;
let mut builder = TrieBuilder::new(); // Inferred `TrieBuilder<u8, u8>` automatically
builder.push("すし", 0);
builder.push("すしや", 1);
builder.push("すしだね", 2);
builder.push("すしづめ", 3);
builder.push("すしめし", 4);
builder.push("すしをにぎる", 5);
builder.push("すし", 6); // Word `push`ed twice uses last value.
builder.push("🍣", 7);
let mut trie = builder.build();
// exact_match(): Find a word exactly match to query.
assert_eq!(trie.exact_match("すし"), Some(&6));
assert_eq!(trie.exact_match("🍣"), Some(&7));
assert_eq!(trie.exact_match("🍜"), None);
// Values can be modified.
let v = trie.exact_match_mut("🍣").unwrap();
*v = 8;
assert_eq!(trie.exact_match("🍣"), Some(&8));
For interactive applications, one can use an incremental search to get the best performance. See [IncSearch][crate::inc_search::IncSearch].
use std::str;
use trie_rs::{TrieBuilder, inc_search::Answer};
let mut builder = TrieBuilder::new(); // Inferred `TrieBuilder<u8, u8>` automatically
builder.push("ab");
builder.push("すし");
builder.push("すしや");
builder.push("すしだね");
builder.push("すしづめ");
builder.push("すしめし");
builder.push("すしをにぎる");
let trie = builder.build();
let mut search = trie.inc_search();
// Query by the byte.
assert_eq!(search.query(&b'a'), Some(Answer::Prefix));
assert_eq!(search.query(&b'c'), None);
assert_eq!(search.query(&b'b'), Some(Answer::Match));
// Reset the query to go again.
search.reset();
// For unicode its easier to use .query_until().
assert_eq!(search.query_until("す"), Ok(Answer::Prefix));
assert_eq!(search.query_until("し"), Ok(Answer::PrefixAndMatch));
assert_eq!(search.query_until("や"), Ok(Answer::Match));
assert_eq!(search.query(&b'a'), None);
assert_eq!(search.query_until("a"), Err(0));
search.reset();
assert_eq!(search.query_until("ab-NO-MATCH-"), Err(2)); // No match on byte at index 2.
map::Trie
associates a Value
with each entry.Value
does not require any traits.Label: Clone
not required to create Trie<Label>
but useful for many reifying search operations like predictive_search()
.Label
at a time.Enables rayon a data parallelism library.
Can determine the size in bytes of nested data structures like the trie itself.
Can serialize and deserialize the trie.
edict.furigana
is used for benchmark.
This file is constructed in the following step:
edict.gz
from EDICT.Many thanks for these dictionaries and tools.
trie-rs uses semantic versioning.
Since current major version is 0, minor version update might involve breaking public API change (although it is carefully avoided).
trie-rs is continuously tested with these Rust versions in with the github CI:
So it is expected to work with Rust 1.75.0 and any newer versions.
Older versions may also work but are not tested or guaranteed.
If support for Rust prior to 1.67.0 is required, trie-rs 0.2.0 supports Rust 1.33.0 and later.
Any kind of pull requests are appreciated.
MIT OR Apache-2.0