Crates.io | qfilter |
lib.rs | qfilter |
version | 0.2.1 |
source | src |
created_at | 2022-12-04 19:33:15.595051 |
updated_at | 2024-07-27 08:39:03.558606 |
description | Efficient bloom filter like datastructure, based on the Rank Select Quotient Filter (RSQF) |
homepage | |
repository | https://github.com/arthurprs/qfilter |
max_upload_size | |
id | 729720 |
size | 82,392 |
Efficient bloom filter like data structure, based on the Rank Select Quotient Filter (RSQF).
This is a small and flexible general-purpose AMQ-Filter. It not only supports approximate membership testing like a bloom filter but also deletions, merging, resizing and serde serialization.
This data structure is similar to a hash table that stores fingerprints in a very compact way. Fingerprints are similar to a hash values, but are possibly truncated. The reason for false positives is that multiple items can map to the same fingerprint. For more information see the quotient filter Wikipedia page that describes a similar but less optimized version of the data structure. The actual implementation is based on the Rank Select Quotient Filter (RSQF).
The public API also exposes a fingerprint API, which can be used to succinctly store u64 hash values.
let mut f = qfilter::Filter::new(1000000, 0.01);
for i in 0..1000 {
f.insert(i).unwrap();
}
for i in 0..1000 {
assert!(f.contains(i));
}
The hashing algorithm used is xxhash3 which offers both high performance and stability across platforms.
For a given capacity and error probability the RSQF may require significantly less space than the equivalent bloom filter or other AMQ-Filters.
Bits per item | Error probability when full | Bits per item (Cont.) | Error (cont.) |
---|---|---|---|
3.125 | 0.362 | 19.125 | 6.87e-06 |
4.125 | 0.201 | 20.125 | 3.43e-06 |
5.125 | 0.106 | 21.125 | 1.72e-06 |
6.125 | 0.0547 | 22.125 | 8.58e-07 |
7.125 | 0.0277 | 23.125 | 4.29e-07 |
8.125 | 0.014 | 24.125 | 2.15e-07 |
9.125 | 0.00701 | 25.125 | 1.07e-07 |
10.125 | 0.00351 | 26.125 | 5.36e-08 |
11.125 | 0.00176 | 27.125 | 2.68e-08 |
12.125 | 0.000879 | 28.125 | 1.34e-08 |
13.125 | 0.000439 | 29.125 | 6.71e-09 |
14.125 | 0.00022 | 30.125 | 3.35e-09 |
15.125 | 0.00011 | 31.125 | 1.68e-09 |
16.125 | 5.49e-05 | 32.125 | 8.38e-10 |
17.125 | 2.75e-05 | .. | .. |
18.125 | 1.37e-05 | .. | .. |
Version 0.2 changed public APIs (e.g. fallible constructors) which required a major version bump.
Serialization is bidirectionally compatible between versions 0.1 and 0.2.
The implementation assumes the popcnt
instruction (equivalent to integer.count_ones()
) is present
when compiling for x86_64 targets. This is theoretically not guaranteed as the instruction is only
available on AMD/Intel CPUs released after 2007/2008. If that's not the case the Filter constructor will panic.
Support for such legacy x86_64 CPUs can be optionally enabled with the legacy_x86_64_support
which incurs a ~10% performance penalty.
This project is licensed under the MIT license.