Crates.io | ofilter |
lib.rs | ofilter |
version | 0.4.3 |
source | src |
created_at | 2023-01-11 21:30:55.737575 |
updated_at | 2024-03-17 13:03:56.061214 |
description | OFilter is a fast thread-safe Bloom filter. |
homepage | https://gitlab.com/liberecofr/ofilter |
repository | https://gitlab.com/liberecofr/ofilter/tree/main |
max_upload_size | |
id | 756715 |
size | 115,277 |
OFilter is a fast thread-safe Bloom filter implemented in Rust.
It implements:
The basic Bloom filter is inspired from the existing and stable bloomfilter crate but does not directly depend on it. The API is slightly different, and makes a few opinionated changes.
In practice, I am using this filter to implement a self-expiring KV store.
Package has 2 optional features:
serde
to enable Serde support for (de)serializationrand
to enable random seeds (enabled by default)While this is, to my knowledge, not used in "real" production, it is not a very complex codebase and comes with a rather complete test suite. Most of the bricks on which it builds are well-tested, widely used packages. So it should be OK to use it. Again, DISCLAIMER, use at your own risks.
use ofilter::Bloom;
let mut filter: Bloom<usize> = Filter::new(100);
assert!(!filter.check(&42));
filter.set(&42);
assert!(filter.check(&42));
Taken from a random CI job:
running 7 tests
test tests::bench_extern_crate_bloom ... bench: 287 ns/iter (+/- 37)
test tests::bench_extern_crate_bloomfilter ... bench: 232 ns/iter (+/- 7)
test tests::bench_ofilter_bloom ... bench: 81 ns/iter (+/- 5)
test tests::bench_ofilter_stream ... bench: 257 ns/iter (+/- 39)
test tests::bench_ofilter_sync_bloom ... bench: 101 ns/iter (+/- 1)
test tests::bench_ofilter_sync_stream ... bench: 280 ns/iter (+/- 14)
test tests::bench_standard_hashset ... bench: 199 ns/iter (+/- 54)
test result: ok. 0 passed; 0 failed; 0 ignored; 7 measured; 0 filtered out; finished in 16.47s
This is not the result of extensive, thorough benchmarking, just a random snapshot at some point in development history.
TL;DR -> OFilter performs relatively well compared to others such as bloom or bloomfilter.
The streaming version is slower but that is expected, as it uses two filters under the hood, and performs extra checks to know when to swap buffers.
It is also interesting to note that using a standard HashSet is quite efficient for small objects. The benchmark above uses isize entries. So if your set is composed if small elements and is limited in absolute number, using a simple set from the standard library may be good enough. Of course using a Bloom filter has other advantates than raw CPU usage, most importantly it ensures memory usage stays low and constant, which is a great advantage. But keep in mind the problem you're trying to solve. Bench, measure, gather numbers, use facts, not intuition.
To run the benchmarks:
cd bench
rustup default nightly
cargo bench
OFilter is licensed under the MIT license.