Crates.io | cyclic-poly-23 |
lib.rs | cyclic-poly-23 |
version | 0.3.1 |
source | src |
created_at | 2021-12-31 16:16:20.178867 |
updated_at | 2024-02-10 11:48:12.570487 |
description | A rolling, decomposable hash algorithm |
homepage | |
repository | https://code.alienscience.org/alienscience/cyclic-poly-23 |
max_upload_size | |
id | 505853 |
size | 149,263 |
This crate is an implementation of the hashing algorithm given in Section 2.3 of the paper:
Jonathan D. Cohen, Recursive Hashing Functions for n-Grams, ACM Trans. Inf. Syst. 15 (3), 1997.
Other opensource implementations from this paper usually concentrate on other algorithms that appear in later sections.
This hashing algorithm has features that stand out in comparison to other hashing algorithms:
A rolling hash (sometimes called a recurrent hash), can calculate a new hash value from a previous hash value and a small change in the data. Rolling hashes are useful to hash a window of data that slides along a collection of data:
This is much faster than recalculating a hash value for every block.
Rolling hashes are an efficient way for finding a block of data that has a given hash value.
A decomposable hash, can calculate a hash value for a block of data given a hash value for a bigger block of data and the hash value for the remaining data:
In the example above, the hash $h_3
$ can be calculated from the hashes $h_1
$ and $h_2
$.
The Adler32 hash/checksum algorithm can be used as a rolling hash and was made popular by the zlib
library and rsync
tool.
Performance comparisons were done on a laptop and are relative to the adler32 crate. To run this comparison execute cargo bench
.
The Cyclic Polynomial hashing is slower than Adler32 when hashing a single block.
Algorithm | MB/sec |
---|---|
Cyclic Poly 32bit | 2127 |
Cyclic Poly 64bit | 2126 |
Adler32 | 2562 |
The calculation of rolling hashes is faster than Adler32.
Algorithm | MB/sec |
---|---|
Cyclic Poly 32bit | 1254 |
Cyclic Poly 64bit | 1048 |
Adler32 | 170 |
To calculate hash collisions on random data execute:
cargo run --example collisions
This will output the number of collisions.
Algorithm | Collisions |
---|---|
Random Number Generator | 118 |
Cyclic Poly 32bit | 140 |
Adler32 | 2872 |
By default, this crate uses the Rust standard library enabled with the feature std
. The crate supports no_std
if it is included as a dependency without default features e.g.:
cyclic-poly-23 = { version = "xxx", default-features = false }