fuzzy_prefix_search

Crates.iofuzzy_prefix_search
lib.rsfuzzy_prefix_search
version0.3.6
sourcesrc
created_at2024-08-06 13:41:54.067875
updated_at2024-09-12 05:57:45.943113
descriptionFuzzy search for prefix matches
homepagehttps://github.com/tpisto/fuzzy_prefix_search
repositoryhttps://github.com/tpisto/fuzzy_prefix_search
max_upload_size
id1327307
size50,250
Tommi Pisto (tpisto)

documentation

README

github crates.io docs.rs tests

Fuzzy Prefix Search

Flexible Trie implementation in Rust for fuzzy prefix string searching and auto-completion.

Documentation:

Features

  • Prefix-based fuzzy searching (Levenshtein distance)
  • Fuzzy search with customizable edit distance
  • Multiple data associations per word
  • Jaro-Winkler similarity scoring for search results
  • Thread safe
  • No dependencies

Installation

Add this to your Cargo.toml:

[dependencies]
fuzzy_prefix_search = "0.2"

Usage

Here's a quick example of how to use the Fuzzy Prefix Search:

use fuzzy_prefix_search::Trie;

fn main() {
    let mut trie = Trie::new();

    // Insert words with associated data
    trie.insert("apple", 1);
    trie.insert("application", 2);
    trie.insert("appetite", 3);

    // Perform a fuzzy search
    let results = trie.search_within_distance("appl", 1);

    for result in results {
        println!("Word: {}, Data: {:?}", result.word, result.data);
    }

    // Search with similarity scoring
    let scored_results = trie.search_within_distance_scored("aple", 2);

    for result in scored_results {
        println!("Word: {}, Data: {:?}, Score: {}", result.word, result.data, result.score);
    }
}

Advanced Usage

Custom Data Types

The Trie supports any data type that implements Clone, PartialEq, Eq, and Hash:

#[derive(Clone, PartialEq, Eq, Hash)]
struct CustomData {
    id: u32,
    value: String,
}

let mut trie = Trie::new();
trie.insert("example", CustomData { id: 1, value: "Test".to_string() });

Removing Data

You can remove all occurrences of a specific data value:

trie.remove_all(&2);

Performance

  • O(k) time complexity for insertion, where k is the length of the word
  • Space-efficient storage using a tree structure with shared prefixes
  • TODO: Benchmarks, optimizations, algorithm selection...

Caveat Emptor: we use unsafe in deletes for 2x read performance compared to Rc/RefCell approach.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Resources

License

This project is licensed under the MIT or Apache 2.0 License - see the LICENSE files for details.

Commit count: 0

cargo fmt