symscan

Crates.iosymscan
lib.rssymscan
version0.7.2
created_at2025-12-11 19:56:22.94054+00
updated_at2025-12-12 23:28:42.916919+00
descriptionFast discovery of similar strings in bulk
homepagehttps://github.com/yutanagano/symscan
repositoryhttps://github.com/yutanagano/symscan
max_upload_size
id1980480
size76,223
Yuta Nagano (yutanagano)

documentation

README

SymScan

Check out the documentation page.

SymScan enables extremely fast discovery of pairs of similar strings within and across large collections.

SymScan is a variation on the symmetric deletion algorithm that is optimised for bulk-searching similar strings within one or across two large string collections at once (e.g. searching for similar protein sequences among a collection of 10M). The key algorithmic difference between SymScan and traditional symmetric deletion is the use of a sort-merge join approach in place of hash maps to discover input strings that share common deletion variants. This sort-and-scan approach trades off an additional factor of O(log N) (with N the total number of strings being compared) in expected time complexity for improved cache locality and effective parallelization, and ends up being much faster for the above use case.

Installing

CLI

brew install yutanagano/tap/symscan-cli

Rust library

cargo add symscan

Python package

pip install symscan

Licensing

SymScan is dual-licensed under the MIT and Apache 2.0 licenses. Unless explicitly stated otherwise, any contribution submitted by you, as defined in the Apache license, shall be dual-licensed as above, without any additional terms and conditions.

Commit count: 0

cargo fmt