editdistancek

Crates.ioeditdistancek
lib.rseditdistancek
version1.0.2
sourcesrc
created_at2022-08-03 23:10:54.752302
updated_at2023-06-29 16:50:49.449414
descriptionFast algorithm for computing edit distance
homepage
repositoryhttps://github.com/nkkarpov/editdistancek
max_upload_size
id638388
size30,337
Nikolai Karpov (nkkarpov)

documentation

https://github.com/nkkarpov/editdistancek

README

editdistancek

This library calculates the Levenshtein edit distance between two strings. The edit distance represents the minimum number of edits required to transform one string into the other.

Our method is inspired by the algorithms of Ukkonen and Landau, Myers, and Schmidt. These strategies offer efficient ways of calculating the edit distance.

The algorithm operates with a running time of O(d*min(n,m)) and O(d^2 + n) in expectation. Here, 'd' stands for the edit distance between the two strings, while 'n' and 'm' denote the lengths of these strings, respectively. This efficient running time ensures the algorithm remains performant even for larger strings.

Motivation

The primary objective of this library is to devise a quick solution for the bounded version of the edit distance problem. Specifically, it aims to handle scenarios where, given a threshold 'k', it returns the edit distance 'd' if 'd' is less than or equal to 'k'. Otherwise, it indicates that the distance exceeds 'k'. This approach addresses the typical shortcoming of the classic edit distance algorithm, which tends to be inefficient in these cases.

The 'bounded' aspect of this library makes it particularly useful in scenarios where there's a predefined threshold of acceptable dissimilarity. This might be beneficial in time-sensitive applications or when working with very large strings, where calculating the full Levenshtein distance would be computationally prohibitive.

P.S. The 'k' in editdistancek refers to the upper bound for the algorithm. In practical terms, this means that the algorithm only computes the Levenshtein edit distance up to this 'k' value.

Installation

Add to Cargo.toml

[dependecies]
editdistancek = "1.*"

Usage

edit_distance("kitten".as_bytes(), "sitting".as_bytes()); // => 3
edit_distance_bounded("kitten".as_bytes(), "sitting".as_bytes(), 3); // => Some(3)
edit_distance_bounded("kitten".as_bytes(), "sitting".as_bytes(), 2); // => None

License

MIT License

Commit count: 17

cargo fmt