| Crates.io | bitcoinleveldb-bloom |
| lib.rs | bitcoinleveldb-bloom |
| version | 0.1.19 |
| created_at | 2023-01-18 19:22:36.262033+00 |
| updated_at | 2025-12-01 16:57:47.668562+00 |
| description | LevelDB-compatible Bloom filter policy and test harness for the bitcoin-rs LevelDB port, implementing Kirsch–Mitzenmacher double hashing over byte slices. |
| homepage | |
| repository | https://github.com/klebs6/bitcoin-rs |
| max_upload_size | |
| id | 761965 |
| size | 223,031 |
LevelDB-compatible Bloom filter implementation and test harness used by the bitcoin-rs LevelDB port. This crate focuses on byte-level compatibility with the original C++ LevelDB BuiltinBloomFilter2 policy while providing a safe Rust façade.
This crate implements a Bloom filter policy and supporting utilities mirroring LevelDB's built-in Bloom filter (leveldb.BuiltinBloomFilter2). It is intended to be plugged into a LevelDB-compatible storage engine within the bitcoin-rs project, but it can also be used as a standalone Bloom filter implementation over arbitrary byte slices.
Key properties:
BuiltinBloomFilter2 (same hashing scheme, layout, and trailing k marker).Slice pointers (for FFI / LevelDB core) or from standard &[u8] slices.tracing for detailed diagnostics in performance and correctness testing.The crate exposes two main abstractions:
BloomFilterPolicy: a FilterPolicy / CreateFilter / KeyMayMatch implementation compatible with the LevelDB APIs provided elsewhere in bitcoin-rs.BloomTest: a stateful helper to accumulate keys, build a filter, and empirically measure false positive rates.This crate lives inside the bitcoin-rs monorepo:
It assumes the existence of several traits and types from the surrounding LevelDB port, particularly:
Slice: a lightweight view over *const u8 + length (mirroring LevelDB's Slice).FilterPolicy, CreateFilter, KeyMayMatch, Named: traits defining the filter behavior and metadata.If you are using this crate outside the monorepo, you will typically consume it indirectly via the LevelDB layer that defines these traits and the Slice abstraction.
In your Cargo.toml:
[dependencies]
bitcoinleveldb-bloom = "0.1.19"
This crate uses Rust 2021 edition and is licensed under MIT.
BloomFilterPolicy#[derive(Clone, Debug, Getters, Setters, Builder)]
#[getset(get = "pub", set = "pub")]
pub struct BloomFilterPolicy {
bits_per_key: usize,
k: usize,
}
BloomFilterPolicy encapsulates the Bloom filter configuration:
bits_per_key: approximate number of bits in the filter per inserted key.k: number of hash probes per key (automatically derived from bits_per_key).Construction:
impl BloomFilterPolicy {
pub fn new(bits_per_key_: i32) -> Self { /* ... */ }
}
pub fn new_bloom_filter_policy(bits_per_key_: i32) -> Box<dyn FilterPolicy> { /* ... */ }
Behavior:
bits_per_key_ is clamped to 0, then coerced to at least 1 bit per key.k is computed as k = floor(bits_per_key * ln(2)) ≈ floor(bits_per_key * 0.69), which approximates the theoretically optimal number of probes and is then clamped to 1 ≤ k ≤ 30.This matches the LevelDB design: choosing k = ln(2) * m/n where m is the number of bits and n is the number of keys minimizes false positives for a given filter size.
Interfaces implemented (from the LevelDB API layer):
Named — returns "leveldb.BuiltinBloomFilter2" for wire compatibility.FilterPolicy — marker trait for LevelDB filter policies.CreateFilter — builds filters from *const Slice.KeyMayMatch — checks potential membership using a Slice key and filter.The policy exposes a fully byte-based API for independent use:
impl BloomFilterPolicy {
pub fn create_filter_from_bytes(&self, keys: &[&[u8]], dst: &mut Vec<u8>);
pub fn key_may_match_bytes(&self, key: &[u8], bloom_filter: &[u8]) -> bool;
}
create_filter_from_bytes:
keys: &[&[u8]] — array of byte-slice keys.dst:
bytes zero-initialized bytes for the bitset.k.Filter layout:
num_keys = keys.len().bpk = self.bits_per_key.bits = max(64, num_keys * bpk) — small filters are padded to at least 64 bits to avoid pathological false positive rates.bytes = (bits + 7) / 8 and bits is rounded up to a multiple of 8.dst is extended by bytes zero bytes, then a trailing byte k is pushed.Hashing (Kirsch–Mitzenmacher double hashing):
h = leveldb_hash(key, seed = 0xbc9f1d34).delta = (h >> 17) | (h << 15) (rotate right by 17 bits).probe from 0 to k-1:
bitpos = (h % bits).array[bitpos / 8] |= 1 << (bitpos % 8).h = h.wrapping_add(delta).This is exactly the double-hashing construction described by Kirsch and Mitzenmacher, reducing the cost of k independent hashes to a single base hash plus linear updates.
key_may_match_bytes:
bits = (len(filter) - 1) * 8.k = filter[len(filter)-1].k > 30, the function treats the filter as a match for compatibility with potential future encodings (matching LevelDB's semantics).false on the first missing bit, true if all k bits are present.This provides a canonical, constant-time-per-probe membership test with standard Bloom filter semantics.
BloomTest#[derive(Debug, Getters, Setters, Builder)]
#[getset(get = "pub", set = "pub")]
pub struct BloomTest {
policy: BloomFilterPolicy,
filter: Vec<u8>,
keys: Vec<Vec<u8>>,
}
BloomTest is a harness around BloomFilterPolicy used to:
Key methods:
impl BloomTest {
pub fn reset(&mut self);
pub fn add_key_slice(&mut self, key: &[u8]);
pub fn add_key_str(&mut self, key: &str);
pub fn add_key_slice_object(&mut self, s: &Slice);
pub fn add(&mut self, s: &Slice);
pub fn build(&mut self);
pub fn filter_size(&self) -> usize;
pub fn dump_filter(&self);
pub fn matches_slice(&mut self, key: &[u8]) -> bool;
pub fn matches_str(&mut self, key: &str) -> bool;
pub fn false_positive_rate(&mut self) -> f64;
}
Behavior:
Default builds a policy with bits_per_key = 10 (roughly ~1% false positive rate in classical Bloom filter analysis).add_* methods stage keys in self.keys.build converts all staged keys into a single Bloom filter (using create_filter_from_bytes), stores it in self.filter, and clears keys.matches_* lazily builds the filter if keys is not empty before checking membership.false_positive_rate tests 10,000 new keys encoded via encode_fixed32_into into a 4-byte buffer, offset by 1_000_000_000 to avoid overlap with typical test data, counts how many yield true under matches_slice, and returns the observed fraction.dump_filter logs a textual visualization of the filter bits, ignoring the trailing k byte, with bits rendered as 1 (set) or . (unset) and space-separated per byte.
pub fn bloom_hash(key_: &Slice) -> u32 { /* ... */ }
Slice using leveldb_hash with seed 0xbc9f1d34.BloomFilterPolicy's hashing scheme.pub fn encode_fixed32_to_bytes(value: u32) -> [u8; 4];
pub fn encode_fixed32_into(value: u32, buffer: &mut [u8; 4]);
u32 values into 4-byte little-endian representation, mirroring LevelDB's fixed-width integer encoding.pub fn key(i: i32, buffer: *mut u8) -> Slice { /* ... */ }
i with encode_fixed32_to_bytes into buffer and returns a Slice over it.buffer is null, returns an empty Slice built from a null pointer and zero length (for defensive usage patterns).pub fn next_length(length: i32) -> i32 { /* ... */ }
use bitcoinleveldb_bloom::BloomFilterPolicy;
fn main() {
// Aim for ~10 bits per key (typical LevelDB default).
let policy = BloomFilterPolicy::new(10);
// Keys as raw byte slices.
let key1 = b"alpha";
let key2 = b"beta";
let key3 = b"gamma";
let keys: [&[u8]; 3] = [key1.as_ref(), key2.as_ref(), key3.as_ref()];
let mut filter = Vec::new();
policy.create_filter_from_bytes(&keys, &mut filter);
assert!(policy.key_may_match_bytes(b"alpha", &filter));
assert!(policy.key_may_match_bytes(b"beta", &filter));
// Typical Bloom filter behavior: non-members should usually be rejected,
// but could be false positives.
let might_contain = policy.key_may_match_bytes(b"delta", &filter);
println!("delta may be in the set: {might_contain}");
}
BloomTest to Empirically Check False Positivesuse bitcoinleveldb_bloom::BloomTest;
fn main() {
let mut tester = BloomTest::default();
// Add some keys.
tester.add_key_str("a");
tester.add_key_str("b");
tester.add_key_str("c");
// Build the filter explicitly (matches_* will also build lazily).
tester.build();
assert!(tester.matches_str("a"));
assert!(tester.matches_str("b"));
assert!(tester.matches_str("c"));
let fp_rate = tester.false_positive_rate();
println!("Observed false positive rate: {fp_rate:.4}");
}
FilterPolicy Trait Objectsuse bitcoinleveldb_bloom::new_bloom_filter_policy;
// use crate::leveldb::{FilterPolicy, Options, DB}; // from bitcoin-rs LevelDB layer
fn configure_leveldb() {
// Create a boxed policy with 10 bits per key.
let bloom_policy = new_bloom_filter_policy(10);
// This example is schematic; actual LevelDB options will be defined by the
// surrounding `bitcoin-rs` LevelDB wrapper.
//
// let mut options = Options::default();
// options.filter_policy = Some(bloom_policy);
// let db = DB::open(options, "/path/to/db").unwrap();
}
The Bloom filter implemented here follows the classical analysis:
m with k hash functions.n keys, the probability of a false positive is approximately:[ p \approx \left(1 - e^{-kn/m}\right)^k. ]
Choosing m = bits_per_key * n and k ≈ ln(2) * m/n minimizes p for fixed m, and yields:
[ p_{min} \approx (0.5)^{bits_per_key \cdot \ln 2} \approx (0.6185)^{bits_per_key}. ]
Thus:
bits_per_key = 10 gives p ≈ 0.8%.bits_per_key trades memory for increased false positives.The Kirsch–Mitzenmacher double hashing used in this crate preserves asymptotic false positive characteristics while reducing hash computation to a single base hash plus linear updates.
The filter construction additionally enforces bits ≥ 64 for small n, preventing degenerate configurations where m becomes so small that the asymptotic approximation no longer yields reasonable behavior.
All core operations (construction, hashing, filter building, membership testing) are instrumented with the tracing crate, using log levels:
info! for high-level events (policy construction, false positive statistics),debug! for structural details (filter sizes, reset actions),trace! for per-key and per-bit actions (probing, bit-setting),warn! and error! for anomalous conditions (invalid parameters, null pointers where unexpected).Enable appropriate tracing subscribers in your application to inspect filter behavior, verify interoperability, or benchmark performance.
The crate assumes a Slice type akin to LevelDB's, consisting of a raw pointer and length. The unsafe blocks are localized and guarded:
Slice::data() with zero length are handled gracefully, treated as empty keys.create_filter converts *const Slice + n into a slice with from_raw_parts, only when n > 0 and keys is non-null.Slice metadata.When integrating with C or C++:
Slice lifetime guarantees match those expected by the Rust side.*mut u8 passed into key points to at least 4 bytes of valid writable memory.This crate is distributed under the MIT License. See the repository for full licensing information.