# Stringmetrics This is a Rust library for approximate string matching that implements simple algorithms such has Hamming distance, Levenshtein distance, Jaccard similarity, and more. Here are some useful quick links: - Crate info: - Crate docs: - Python library page: - Crate source: ## Algorithms The main purpose of this library is to provide a variety of string metric functions. Included algorithms are: - Levenshtein Distance - Limited & Weighted Levenshtein Distance - Jaccard Similarity - Hamming Distance See [the documentation](https://docs.rs/stringmetrics/) for full information. Some examples are below: ```rs // Basic levenshtein distance use stringmetrics::levenshtein; assert_eq!(levenshtein("kitten", "sitting"), 3); ``` ```rs // Levenshtein distance with a limit to save computation time use stringmetrics::levenshtein_limit; assert_eq!(levenshtein_limit("a very long string", "short!", 4), 4); ``` ```rs // Set custom weights use stringmetrics::{levenshtein_weight, LevWeights}; // This struct holds insertion, deletion, and substitution costs let weights = LevWeights::new(4, 3, 2); assert_eq!(levenshtein_weight("kitten", "sitting", 100, &weights), 8); ``` ```rs // Basic hamming distance use stringmetrics::hamming; let a = "abcdefg"; let b = "aaadefa"; assert_eq!(hamming(a, b), Ok(3)); ``` ## Future Algorithms & Direction Eventually, this library aims to add support for more algorithms. Intended work includes: 1. Update levenshtein distance to have a more performant algorithm for short (<64 characters) and long (>100 characters) strings 2. Add the Damerau–Levenshtein distance 3. Add the Jaro–Winkler distance 4. Add the Tversky index 5. Add Cosine similarity 6. Add some useful tokenizers with examples ## License See the LICENSE file for license information. The provided license does allow for proprietary use and adaptation; that being said, I kindly suggest that if you come up with an improvement, you submit a pull request and help us all out :)