| Crates.io | mphf |
| lib.rs | mphf |
| version | 0.2.0 |
| created_at | 2020-10-04 16:50:31.450459+00 |
| updated_at | 2026-01-02 00:47:36.566342+00 |
| description | An exercise in implementing minimal perfect hash table. |
| homepage | |
| repository | https://github.com/CasualX/mphf |
| max_upload_size | |
| id | 296078 |
| size | 18,788 |
An exercise in implementing minimal perfect hash table.
mphf::static_map!(MY_TABLE: i32; {
"apple" => 1,
"banana" => 2,
"cherry" => 3,
"date" => 4,
});
let map = MY_TABLE.as_ref();
assert_eq!(map.get("banana"), Some(&2));
assert_eq!(map.get("fig"), None);
There's an array/slice of keys and values of the same length. A hash function is used to map a key to an index in these arrays:
let index = hash(key) % N;
However, this does not guarantee that all the keys map to unique indices, leading to collisions.
To resolve this, the idea is to introduce a second level of hashing.
First, let's introduce a keyed hash function. A keyed hash function takes an additional seed parameter, effectively creating a family of hash functions where each seed selects a different one:
fn hash(seed: u32, key: &str) -> u32;
MurmurHash3 is exactly such a hash function and is used in this implementation.
In the first stage, we hash the key with seed 0. We'll use this hash modulo the number of buckets to look up the 'displacement' seed for the second stage.
Then we hash the key again with the displacement seed to get the final index into the keys/values arrays:
static SEEDS: [u32; M] = [ ... ]; // Displacement seeds for each bucket
static KEYS: [&str; N] = [ ... ]; // Keys array
let bucket = hash(0, key) % SEEDS.len();
let seed = SEEDS[bucket];
let index = hash(seed, key) % KEYS.len();
if KEYS[index] != key {
return None;
}
return Some(index);
The displacement seeds are chosen such that all keys map to unique indices in the second stage.
There is no clever trick being used here. When building the displacement seeds table we simply brute-force search for seeds that work for each bucket.
To make this work in practice, the entire construction runs in const fn, which imposes strict limits on how much brute-force search can be performed during compilation.
Does it work? Yes. Is it faster than match on string literals? No. It's about 4x slower in my benchmarks.
running 2 tests
test bench_match ... bench: 57.07 ns/iter (+/- 0.78)
test bench_phf ... bench: 236.27 ns/iter (+/- 5.05)
Licensed under MIT License, see license.txt.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, shall be licensed as above, without any additional terms or conditions.