| Crates.io | symscan |
| lib.rs | symscan |
| version | 0.7.2 |
| created_at | 2025-12-11 19:56:22.94054+00 |
| updated_at | 2025-12-12 23:28:42.916919+00 |
| description | Fast discovery of similar strings in bulk |
| homepage | https://github.com/yutanagano/symscan |
| repository | https://github.com/yutanagano/symscan |
| max_upload_size | |
| id | 1980480 |
| size | 76,223 |
SymScan enables extremely fast discovery of pairs of similar strings within and across large collections.
SymScan is a variation on the symmetric deletion algorithm that is optimised for bulk-searching similar strings within one or across two large string collections at once (e.g. searching for similar protein sequences among a collection of 10M). The key algorithmic difference between SymScan and traditional symmetric deletion is the use of a sort-merge join approach in place of hash maps to discover input strings that share common deletion variants. This sort-and-scan approach trades off an additional factor of O(log N) (with N the total number of strings being compared) in expected time complexity for improved cache locality and effective parallelization, and ends up being much faster for the above use case.
brew install yutanagano/tap/symscan-cli
cargo add symscan
pip install symscan
SymScan is dual-licensed under the MIT and Apache 2.0 licenses. Unless explicitly stated otherwise, any contribution submitted by you, as defined in the Apache license, shall be dual-licensed as above, without any additional terms and conditions.