[![Test](https://github.com/malwaredb/lzjd-rs/actions/workflows/test.yml/badge.svg)](https://github.com/malwaredb/lzjd-rs/actions/workflows/test.yml)[![Lint](https://github.com/malwaredb/lzjd-rs/actions/workflows/lint.yml/badge.svg)](https://github.com/malwaredb/lzjd-rs/actions/workflows/lint.yml)[![Crates.io Version](https://img.shields.io/crates/v/malwaredb-lzjd)](https://crates.io/crates/malwaredb-lzjd) # LZJD [Documentation](https://docs.rs/malwaredb-lzjd) Rust implementation of Lempel-Ziv Jaccard Distance (LZJD) algorithm based on [jLZJD](https://github.com/EdwardRaff/jLZJD) by Edward Raff. Main differences: - Rust instead of Java - Can use any hasher (executable uses CRC32) instead of just Murmur3 - Does not allocate memory for every unique hash, instead keeps k=1024 smallest - Based on `Vec` instead of `IntSetNoRemove`, which is more like HashMap - Hash files are considerably smaller if small sequences have been digested This fork has minor changes: * Update to Rust edition 2021. * Remove dependencies preventing it from working on non-x86 hardware. ``` USAGE: lzjd [FLAGS] [OPTIONS] ... FLAGS: -c, --compare compare SDBFs in file, or two SDBF files -r, --deep generate SDBFs from directories and files -g, --gen-compare compare all pairs in source data -h, --help Prints help information -V, --version Prints version information OPTIONS: -o, --output send output to files -t, --threshold only show results >= threshold [default: 1] ARGS: ... Sets the input file to use ``` ## See also: - [Original paper](http://www.edwardraff.com/publications/alternative-ncd-lzjd.pdf) - [Follow-up paper](https://arxiv.org/abs/1708.03346)