Crates.io | clustr |
lib.rs | clustr |
version | 0.1.2 |
source | src |
created_at | 2022-08-19 12:42:12.863225 |
updated_at | 2022-08-20 12:10:08.231777 |
description | Multithreaded string clustering |
homepage | https://github.com/TristanBester/clustr |
repository | https://github.com/TristanBester/clustr |
max_upload_size | |
id | 648676 |
size | 24,644 |
Documentation: https://docs.rs/clustr/0.1.2/clustr/
Crate: https://crates.io/crates/clustr
Source Code: https://github.com/TristanBester/clustr
This crate provides a scalable string clustering implementation.
Strings are aggregated into clusters based on pairwise Levenshtein distance. If the distance is below a set fraction of the shorter string’s length, the strings are added to the same cluster.
[dependencies]
clustr = "0.1.2"
Basic usage:
let inputs = vec!["aaaa", "aaax", "bbbb", "bbbz"];
let expected = vec![vec!["aaaa", "aaax"], vec!["bbbb", "bbbz"]];
let clusters = clustr::cluster_strings(&inputs, 0.25, 1)?;
assert_eq!(clusters, expected);
Multithreading:
let inputs = vec!["aa", "bb", "aa", "bb"];
let expected = vec![vec!["aa", "aa"], vec!["bb", "bb"]];
let results = clustr::cluster_strings(&inputs, 0.0, 4)?;
// Order of returned clusters nondeterministic
for e in expected {
assert!(results.contains(&e));
}