Crates.io | multiset-hash |
lib.rs | multiset-hash |
version | 0.2.0 |
source | src |
created_at | 2021-07-23 18:19:07.948307 |
updated_at | 2021-07-23 18:21:14.206843 |
description | An incremental hash function for multi-sets. |
homepage | |
repository | https://github.com/cronokirby/multiset-hash |
max_upload_size | |
id | 426363 |
size | 11,390 |
This crate provides a hash function for multi-sets, which can be updated incrementally. This function takes in a collection of objects, with each object potentially appearing multiple times, and outputs a digest. This result depends only on which objects are in the set, and how many times they appear. Calling the function with the objects in a different order, or grouped in a different way, will not change the result of the function.
This allows the function to be incrementally fed with new objects as they arrive, without having to keep the entire set in memory.
This crate was inspired by a 2003 paper
from Clarke et al. They define a function for hashing objects x
with multiplicity m(x)
as:
sum_x m(x) * H(x)
Where H
is a hash function into a suitable abelian group. In their case, they used
the multiplicative group of a finite field. Since then, we've developed much more elegant
(in my opinion) Elliptic Curve groups. This crate uses Ristretto
as implemented by the curve25519-dalek crate.
In the usual case, you add in different byte slices, with multiplicity For example,
to hash the set {cat, cat, dog, dog}
, you could do:
# use multiset_hash::RistrettoHash;
use digest::Digest;
use sha2::Sha512;
let mut hash = RistrettoHash::<Sha512>::default();
hash.add(b"cat", 2);
hash.add(b"dog", 2);
This is equivalent to hashing the same number of objects, with a different grouping and order:
# use multiset_hash::RistrettoHash;
# use sha2::Sha512;
use digest::Digest;
let mut hash = RistrettoHash::<Sha512>::default();
hash.add(b"dog", 1);
hash.add(b"cat", 1);
hash.add(b"cat", 1);
hash.add(b"dog", 1);
You can also hash objects in multiple pieces using the update
method:
# use multiset_hash::RistrettoHash;
# use sha2::Sha512;
use digest::Digest;
let mut hash = RistrettoHash::<Sha512>::default();
hash.update(b"d");
hash.update(b"og");
hash.end_update(2);
hash.add(b"cat", 2);
If you use update
, you must call end_update
before adding another object,
or calling finalize
to get the output of the hash function.