| Crates.io | similar_lines |
| lib.rs | similar_lines |
| version | 0.1.0 |
| created_at | 2025-10-13 02:40:57.195592+00 |
| updated_at | 2025-10-13 02:40:57.195592+00 |
| description | Detect identical lines shared between two repositories using a suffix-array index |
| homepage | https://github.com/vincentzed/inference/tree/main/similar_lines |
| repository | https://github.com/vincentzed/inference |
| max_upload_size | |
| id | 1879934 |
| size | 8,408,233 |
Detect identical lines of source code shared between two repositories using a
suffix-array index backed by libsufr.
The project provides both a reusable Rust library and a CLI front-end.
S = L₀ ∘ 0x1E ∘ ….A over S; adjacent suffixes with an
LCP (longest-common-prefix) equal to the entire line correspond to duplicated
lines.The index construction runs in O(|S| log |S|) followed by a single linear scan
of the sorted suffixes.
libsufr dependencycargo build --release
cargo fmt
cargo clippy --all-targets --all-features -- -D warnings
cargo test
cargo run --release -- \
/path/to/repo-a \
/path/to/repo-b \
--min-length 40 \
--max-results 25 \
--format json \
--output matches.json
Key flags:
--min-length — minimum normalized line length (defaults to 30)--max-results — optional cap on reported groups--format — text (default) or json--output — write to a file instead of stdoutuse similar_lines::{find_similar_lines, Config};
# fn example() -> anyhow::Result<()> {
let results = find_similar_lines(&Config {
repo_a: "../repo-a".into(),
repo_b: "../repo-b".into(),
min_length: 48,
max_results: Some(5),
})?;
for group in results {
println!("{} ({} hits)", group.content, group.occurrences.len());
}
# Ok(())
# }
The MatchGroup values returned contain the duplicated line and all
Occurrences with repository name, relative path, and line number.
content_inspector heuristics.