Crates.io | mqfilters |
lib.rs | mqfilters |
version | |
source | src |
created_at | 2024-11-03 13:23:40.130519 |
updated_at | 2024-11-29 21:15:58.047994 |
description | Highly optimized approximate membership query filters (bloom, cuckoo, xor, quotient) with SIMD support |
homepage | https://github.com/farazdagi/mqfilters |
repository | https://github.com/farazdagi/mqfilters |
max_upload_size | |
id | 1433762 |
Cargo.toml error: | TOML parse error at line 18, column 1 | 18 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
mqfilters
)Highly optimized approximate membership query filters in safe Rust.
The purpose of this repository is to gather multiple implementations of approximate membership query algorithms, along with a brief discussion of their applicability in different contexts. The crust of the problem is to be able to provide a data structure that will take a fraction of space compared to the original data, while still being able to answer the question whether a given element is in the set or not.
bf
)bf
)Based on the original Bloom Filter described by Burton Howard Bloom in the seminal Space/Time Trade-offs in Hash Coding with Allowable Errors, 1970 paper.
Bloom considered three factors when optimizing the membership queries:
The main idea is that by introducing a small probability of false positives, we can significantly reduce the space usage, all without increasing the reject time. This reduction in space can be very significant as it can become a deciding factor in whether a filter can be stored in memory or not.
The original design is only slightly updated, to use double hashing (instead of multiple distinct hash functions), as described in Less Hashing, Same Performance: Building a Better Bloom Filter, 2006 by Kirsch and Mitzenmacher.
Currently implemented is a semi-dynamic Bloom Filter, which means that it supports insertions, but not deletions, so filter can be grow, but not shrink.
SSTables
in LSMTrees
, where we have to merge data from time to time and we can create a new
filter during such merge. Having this invariant allows for a more efficient implementation.