jdb_xorf

Crates.iojdb_xorf
lib.rsjdb_xorf
version0.13.11
created_at2026-01-09 04:27:56.495705+00
updated_at2026-01-16 19:40:38.759391+00
descriptionUltra-fast Xor and Binary Fuse filters for Rust / 极致性能的 Rust Xor 与 Binary Fuse 过滤器
homepagehttps://github.com/js0-site/rust/tree/main/jdb_xorf
repositoryhttps://github.com/js0-site/rust
max_upload_size
id2031572
size234,790
i18n.site (i18nsite)

documentation

README

English | 中文


jdb_xorf: High-Performance Binary Fuse Filter in Rust

Introduction

jdb_xorf is a high-performance Binary Fuse filter implementation for Rust. Compared to Bloom or Cuckoo filters, this probabilistic data structure offers faster query speeds and smaller memory footprint. Binary Fuse filters represent the state-of-the-art in static set membership testing.

Binary Fuse is the pinnacle of the Xor Filter family and is currently the most efficient static membership test structure known. It has significant advantages over other filters in all key metrics:

Why Choose Binary Fuse?

  • Faster Construction: Uses Graph Partitioning technology to break the problem into small blocks adaptable to L1/L2 cache, making construction 10-20 times faster than traditional Xor Filters.
  • More Space Efficient: Lower space occupancy for the same false positive rate. Bf8 requires only about 8.64 bits/entry to achieve a false positive rate of about 0.39% (space overhead is only 1.08x the theoretical lower limit).
  • Better Locality: Partitioned design significantly reduces CPU cache misses.
  • Ultra-Fast Query: Query is strictly O(1), requiring only 3 memory accesses + 1 hash mixing + 2 XOR calculations.
  • No False Negatives: Guaranteed to return True if the element is in the set.

Performance Benchmark

Table of Contents

Prerequisites and Caveats

No Duplicate Keys Allowed

The construction algorithm for Binary Fuse filters (Bf8, Bf16, Bf32) has a strict prerequisite: the input data structure must not contain duplicate keys. If the input u64 hash values contain duplicates, the construction process will almost certainly fail. If you use the raw filter directly, you must remove duplicates yourself before construction.

automatic deduplication with Bf

It is recommended to use the Bf wrapper to handle arbitrary types (such as String, &[u8], etc.). Bf internally automatically handles all hash calculation, sorting, and deduplication work to ensure construction success rate. You only need to pass in the data, and leave the rest to it.

Usage Demo

Basic Binary Fuse Filter

use jdb_xorf::{Filter, Bf8};

let keys = vec![1u64, 2, 3];
let filter = Bf8::from(&keys);

assert!(filter.has(&1));
assert!(!filter.has(&4));

Construction from Arbitrary Types with Borrowed Query

Bf supports borrow-based querying similar to HashMap. For example, after referencing a String filter, you can query it directly using &str.

use jdb_xorf::{Filter, Bf, Bf8};

let fruits = vec!["apple".to_string(), "banana".to_string()];
// Bf automatically handles hashing and deduplication.
// Uses RapidHasher by default for extremely high performance.
let filter: Bf<String, Bf8> = Bf::from(&fruits);

// Query with &str directly on a String filter
assert!(filter.has("apple"));

Wrapper (Wrap) Mode: Using Bf<str>

If you want the filter to semantically represent a "Filter of strings" (rather than a "Filter of string references") at the type level, you can use .into() to convert Bf<&str> to Bf<str>. This is particularly useful for Unsized Types.

use jdb_xorf::{Bf, Bf8, Filter};

let keys = vec!["apple", "banana"];
// 1. Build a normal Bf<&str> first
let temp_filter: Bf<&str, Bf8> = Bf::from(&keys);

// 2. Convert into Bf<str> (Unsized)
let str_filter: Bf<str, Bf8> = temp_filter.into();

// 3. Query
assert!(str_filter.has("apple"));

Binary Data: Using Bf<[u8]>

Similarly, you can create filters for byte slices.

use jdb_xorf::{Bf, Bf8, Filter};

let data: Vec<&[u8]> = vec![b"hello", b"world"];
let temp_filter: Bf<&[u8], Bf8> = Bf::from(&data);

// Convert into Bf<[u8]>
let bytes_filter: Bf<[u8], Bf8> = temp_filter.into();

assert!(bytes_filter.has(b"hello".as_slice()));

Serialization and Deserialization (Optional Feature)

After enabling the bitcode feature, you can use bitcode::encode / bitcode::decode for serialization directly.

use jdb_xorf::{Bf, Bf8};

// 1. Serialize (Encode)
let keys = vec!["apple", "banana"];
// No need to convert to String, use &str directly
let filter: Bf<&str, Bf8> = Bf::from(&keys);

// Use bitcode library function directly
let bytes = bitcode::encode(&filter);

// 2. Deserialize (Decode)
// Can fully restore type information, including generic parameters
let loaded: Bf<&str, Bf8> = bitcode::decode(&bytes).expect("Decode failed");

assert!(loaded.has("apple"));

Embedding in Structs

Thanks to bitcode support, you can easily embed Bf into your own structs, such as an SSTable structure in a database.

use bitcode::{Decode, Encode};
use jdb_xorf::{Bf, Bf8};

#[derive(Encode, Decode)]
pub struct Sst {
  // Filter for binary data
  pub xorf: Bf<[u8], Bf8>,
}

let keys: Vec<&[u8]> = vec![b"key1", b"key2"];
let filter: Bf<&[u8], Bf8> = Bf::from(&keys);

let sst = Sst {
    xorf: filter.into(),
};

// Serialize the entire struct
let bytes = bitcode::encode(&sst);

Features

  • Ultra-Fast: Picosecond-level query latency.
  • High Efficiency: Extremely high space utilization (Bf8 requires only about 8.64 bit per entry, space overhead is 1.08x theoretical lower limit).
  • Flexible: Provides Bf adapter, supports non-u64 types and automatic deduplication.
  • Portable: Fully supports no_std, suitable for embedded environments.
  • Serialization: Optional support for bitcode serialization.

Algorithm Details (Mermaid)

1. Construction Phase (Peeling Phase)

graph TD
    Start["Start Construction"] --> Init["Calculate Parameters: seg_len, capacity"]
    Init --> SeedIter["Try Next Seed"]
    SeedIter --> Mapping["Map Key: Calculate 3 slots h0, h1, h2"]
    Mapping --> Bucketing["Update Bucket State: t2count++ / t2hash XOR= hash"]
    Bucketing --> FindAlone["Scan Buckets: Find alone buckets with count == 1"]
    FindAlone --> Queue["Add to alone queue"]
    Queue --> PeelLoop{"Is Queue Empty?"}
    PeelLoop -- "No" --> Pop["Pop bucket index, Push key to reverse_order stack"]
    Pop --> Update["Update 2 adjacent buckets: Decrease count and XOR hash sum"]
    Update --> NewAlone{"New alone bucket produced?"}
    NewAlone -- "Yes" --> Queue
    NewAlone -- "No" --> PeelLoop
    PeelLoop -- "Yes" --> Success{"All keys processed?"}
    Success -- "No" --> SeedIter
    Success -- "Yes" --> Done["Enter Solver Phase"]

2. Solver Phase (Solver Phase)

graph TD
    SStart["Start Solver"] --> SInit["Initialize fingerprints array"]
    SInit --> PopStack["Pop key and slot info from reverse_order stack top"]
    PopStack --> ReadOther["Read other 2 determined or initial fingerprints"]
    ReadOther --> Assign["Calculate current fingerprint: fp = target_f XOR fp_other1 XOR fp_other2"]
    Assign --> Next{"Is Stack Empty?"}
    Next -- "No" --> PopStack
    Next -- "Yes" --> SDone["BinaryFuse Construction Successful"]

3. Query Phase (Query Phase)

graph TD
    QKey["Input Query Key"] --> QHash["mix64 Hash Mixing"]
    QHash --> QSlots["Determine 3 slots: h0, h1, h2"]
    QSlots --> QRead["Atomic Read: fp0, fp1, fp2"]
    QRead --> QXor["XOR Operation: res = fp0 XOR fp1 XOR fp2"]
    QXor --> QMatch{"res == (hash as Fingerprint)?"}
    QMatch -- "Yes" --> QPres["Probably Present"]
    QMatch -- "No" --> QNot["Definitely Not Present"]
  1. Hashing: Mix keys using RapidHash or custom hasher.
  2. Mapping: Determine three slots in the partition graph based on hash value.
  3. Lookup: Determine membership by XORing the fingerprints of these three slots.

Technology Stack

  • Language: Rust (Edition 2024).
  • Core Algorithm: Binary-partitioned Fuse Graph algorithm.
  • Hash Algorithm: RapidHash (based on rapidhash crate), high-quality mixing function mix64.
  • Performance Evaluation: Criterion micro-benchmarks.

Directory Structure

  • src/: Core implementation.
    • base/: Generic Binary Fuse algorithm implementation (construction, query, tools).
    • bf.rs: Generic wrapper Bf implementation (arbitrary types & deduplication).
    • hash.rs: Hasher and high-quality mixing function implementation.
  • benches/: Performance benchmark suite.
  • analysis/: Uniformity and zero-distribution analysis tools.

API

Trait

  • Filter<T>: Core trait for membership testing.
    • has(&self, key: &T) -> bool
    • len(&self) -> usize

Types

  • Bf8, Bf16, Bf32: Managed memory filters (Aliases to Base<u8>, Base<u16>, Base<u32>).
  • Bf<T, F, H = RapidHasher>: Generic wrapper, uses hasher H and filter F to handle arbitrary type T, with automatic deduplication.

Summary Comparison Table

Filter Memory Footprint Query Speed Construction Speed Cache Friendliness Scenario
Binary Fuse Very Low (≈1.08x theoretical limit) Very Fast (3 accesses) Very Fast (partition optimized) Excellent Best choice for static massive data
Xor Filter Low Fast Slow Poor Previous generation solution
Bloom Filter Medium Slow (multiple hashes) Fast Poor Dynamic data/Simple scenarios
Cuckoo Filter Low Medium (random probing) Slow Poor Deletion support needed

Construction Failure Probability

The theoretical probability of Binary Fuse filter construction failure is extremely low. This library automatically retries 1000 times during construction (using a different random seed each time).

According to Mueller & Lemire's paper [2], the lower bound for single construction success rate is 90%. Based on measured data from this library (1000 rounds of testing on 100,000 random keys):

  • First-time Construction Success Rate: 98.70% (Successful on first try in 987 out of 1000 times)
  • Average Attempts: 1.013

This means the failure rate for a single construction is only .

Therefore, the probability of failure for 1000 consecutive constructions is:

is a value that can be considered absolutely 0. It is much smaller than the reciprocal of the total number of atoms in the universe, and hundreds of orders of magnitude lower than the probability of uncorrectable errors in modern computer hardware (about ).

Google's research on large-scale data centers shows that about 1.3% of machines experience at least one Uncorrectable Error per year. Assuming a construction process takes 0.1 seconds, the probability of a hardware error occurring during this period is on the order of .

Event Type Approximate Probability Risk Qualification
Hardware Bit Flip During Construction Real existent extremely low risk
Binary Fuse Construction Failure Physically "Absolutely Impossible"

Therefore, the library's design principle is: Treat construction failure as an unrecoverable fatal error (panic), rather than a runtime error (Result/TryFrom).

If you encounter a panic during construction, it is more likely due to:

  1. Duplicate keys in input data (This is the most common reason, even if you think you have deduplicated).
  2. Hardware failure (Memory bit flip, etc.).
  3. Extremely rare probabilistic event (In this case, a simple retry is sufficient, but it is almost impossible to encounter within the timescale of human civilization).

For ease of use, we prioritize using the From trait because it matches the psychological expectation in most scenarios: the construction process is always successful.

Historical Background

The technical evolution of probabilistic filters began with Bloom filters (1970) and went through improvements with Cuckoo filters (2014). The emergence of Xor filters in 2020 brought a paradigm shift, achieving better performance through perfect XOR summation. In 2022, Mueller and Lemire further proposed Binary Fuse filters, which approached theoretical limits in space and time efficiency through graph partitioning technology.

References


About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:

Bench

Performance Benchmark

Library Filter Bf Ops Query Ops Memory Speedup
jdb Bf8 4959.87 86727.74 116.04 KB 1.65x
xorf BinaryFuse8 3971.65 52493.29 116.04 KB -
jdb Bf16 4809.60 76974.42 232.04 KB 1.56x
xorf BinaryFuse16 4015.96 49222.88 232.04 KB -
jdb Bf32 4676.60 67297.71 464.04 KB 1.39x
xorf BinaryFuse32 4016.81 48393.79 464.04 KB -

Accuracy

Library Filter False Positive Rate False Negative Rate
jdb Bf8 0.39196% 0
xorf BinaryFuse8 0.38768% 0
jdb Bf16 0.00132% 0
xorf BinaryFuse16 0.00165% 0
jdb Bf32 0.00000% 0
xorf BinaryFuse32 0.00000% 0

About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:


jdb_xorf : 极致性能的 Rust Binary Fuse 过滤器

项目介绍

jdb_xorf 是针对 Rust 开发的高性能 Binary Fuse 过滤器实现。此类概率型数据结构相较于 Bloom 或 Cuckoo 过滤器,具备更快的查询速度与更小的内存占用。Binary Fuse 过滤器代表了目前静态集合成员检测技术的最高水平。

Binary Fuse 是 Xor Filter 系列的巅峰之作,也是目前已知最高效的静态成员检测结构。相比于其他过滤器,它在所有关键指标上都具有显著优势:

为什么选择 Binary Fuse?

  • 构建速度更快: 采用了图分区 (Graph Partitioning) 技术,将问题拆解为适应 L1/L2 缓存的小型块,构建速度比传统 Xor Filter 快 10-20 倍
  • 更节省空间: 相同误报率下空间占用更低。Bf8 仅需约 8.64 bits/entry 即可达到约 0.39% 的误报率(空间开销仅为理论下限的 1.08x)。
  • 更好的局部性: 分区设计极大幅度减少了 CPU 缓存失效。
  • 极速查询: 查询是严格的 O(1),仅需 3 次内存访问 + 1 次哈希混淆 + 2 次 XOR 计算。
  • 无误漏: 保证如果元素在集合中,一定会返回 True。

Performance Benchmark

目录

注意事项与前提条件

严禁重复元素

Binary Fuse 过滤器 (Bf8, Bf16, Bf32) 的构建算法有一个严格的前提条件:输入的数据结构中不得包含重复的键 (duplicate keys)。如果输入的 u64 哈希值存在重复,构建过程几乎肯定会失败。如果您直接使用原始过滤器,必须在构建前自行去除重复项。

Bf 自动去重

推荐使用 Bf 包装器来处理任意类型(如 String, &[u8] 等)。Bf 会在内部自动处理所有的哈希计算、排序和去重工作,确保构建成功率。您只需传入数据,剩下的交给它处理。

使用演示

基础 Binary Fuse 过滤器

use jdb_xorf::{Filter, Bf8};

let keys = vec![1u64, 2, 3];
let filter = Bf8::from(&keys);

assert!(filter.has(&1));
assert!(!filter.has(&4));

任意类型的构建与借用查询

Bf 支持类似 HashMap 的借用查询。例如,构建 String 过滤器后,可以直接使用 &str 查询。

use jdb_xorf::{Filter, Bf, Bf8};

let fruits = vec!["apple".to_string(), "banana".to_string()];
// Bf 会自动处理哈希和去重。
// 默认使用 RapidHasher 以获得极高性能。
let filter: Bf<String, Bf8> = Bf::from(&fruits);

// 直接使用 &str 查询 String 类型的过滤器
assert!(filter.has("apple"));

包装 (Wrap) 模式:使用 Bf<str>

如果你希望过滤器在类型层面上就代表 "字符串过滤器" (而非 "字符串引用过滤器"),可以使用 into() 方法将 Bf<&str> 转换为 Bf<str>

use jdb_xorf::{Bf, Bf8, Filter};

let keys = vec!["apple", "banana"];
// 1. 先构建常规的 Bf<&str>
let temp_filter: Bf<&str, Bf8> = Bf::from(&keys);

// 2. 转换为 Bf<str> (Unsized)
let str_filter: Bf<str, Bf8> = temp_filter.into();

// 3. 查询
assert!(str_filter.has("apple"));

二进制数据:使用 Bf<[u8]>

同样地,你可以创建针对字节切片的过滤器。

use jdb_xorf::{Bf, Bf8, Filter};

let data: Vec<&[u8]> = vec![b"hello", b"world"];
let temp_filter: Bf<&[u8], Bf8> = Bf::from(&data);

// 转换为 Bf<[u8]>
let bytes_filter: Bf<[u8], Bf8> = temp_filter.into();

assert!(bytes_filter.has(b"hello".as_slice()));

序列化与反序列化 (可选特性)

开启 bitcode 特性后,可直接使用 bitcode::encode / bitcode::decode 进行序列化。

use jdb_xorf::{Bf, Bf8};

// 1. 序列化 (Encode)
let keys = vec!["apple", "banana"];
// 无需转换为 String,直接使用 &str
let filter: Bf<&str, Bf8> = Bf::from(&keys);

// 直接使用 bitcode 库函数
let bytes = bitcode::encode(&filter);

// 2. 反序列化 (Decode)
// 能够完全还原类型信息,包括泛型参数
let loaded: Bf<&str, Bf8> = bitcode::decode(&bytes).expect("Decode failed");

assert!(loaded.has("apple"));

将过滤器作为结构体字段

得益于 bitcode 支持,你可以轻松地将 Bf 嵌入到自己的结构体中,甚至是数据库的 SSTable 结构。

use bitcode::{Decode, Encode};
use jdb_xorf::{Bf, Bf8};

#[derive(Encode, Decode)]
pub struct Sst {
  // 针对二进制数据的过滤器
  pub xorf: Bf<[u8], Bf8>,
}

let keys: Vec<&[u8]> = vec![b"key1", b"key2"];
let filter: Bf<&[u8], Bf8> = Bf::from(&keys);

let sst = Sst {
    xorf: filter.into(),
};

// 序列化整个结构体
let bytes = bitcode::encode(&sst);

特性介绍

  • 极速: 皮秒级查询延迟。
  • 高效: 空间利用率极高(Bf8 每条目仅需约 8.64 bit,空间开销为理论下限的 1.08x)。
  • 灵活: 提供 Bf 适配器,支持非 u64 类型并自动去重
  • 便携: 完整支持 no_std,适用于嵌入式环境。
  • 序列化: 可选支持 bitcode 或 DMA 零拷贝加载。

算法细节 (Mermaid)

1. 构造阶段 (Peeling Phase)

graph TD
    Start["开始构造"] --> Init["计算参数: seg_len, capacity"]
    Init --> SeedIter["尝试下一种子 (Seed)"]
    SeedIter --> Mapping["映射键: 计算 3 个槽位 h0, h1, h2"]
    Mapping --> Bucketing["更新桶状态: t2count++ / t2hash XOR= hash"]
    Bucketing --> FindAlone["扫描桶: 寻找 count == 1 的孤立桶"]
    FindAlone --> Queue["加入 alone 队列"]
    Queue --> PeelLoop{"队列是否为空?"}
    PeelLoop -- "否" --> Pop["弹出桶索引, 将键压入 reverse_order 栈"]
    Pop --> Update["更新相邻 2 个桶: 减少计数并异或哈希和"]
    Update --> NewAlone{"产生新孤立桶?"}
    NewAlone -- "是" --> Queue
    NewAlone -- "否" --> PeelLoop
    PeelLoop -- "是" --> Success{"是否处理了所有键?"}
    Success -- "否" --> SeedIter
    Success -- "是" --> Done["进入求解阶段"]

2. 求解阶段 (Solver Phase)

graph TD
    SStart["开始求解"] --> SInit["初始化指纹数组 fingerprints"]
    SInit --> PopStack["从 reverse_order 栈顶弹出键与槽位信息"]
    PopStack --> ReadOther["读取另外 2 个已确定或初始的指纹"]
    ReadOther --> Assign["计算当前指纹: fp = target_f XOR fp_other1 XOR fp_other2"]
    Assign --> Next{"栈是否为空?"}
    Next -- "否" --> PopStack
    Next -- "是" --> SDone["BinaryFuse 构建成功"]

3. 查询阶段 (Query Phase)

graph TD
    QKey["输入查询键"] --> QHash["mix64 哈希混淆"]
    QHash --> QSlots["确定 3 个槽位: h0, h1, h2"]
    QSlots --> QRead["原子读取: fp0, fp1, fp2"]
    QRead --> QXor["异或运算: res = fp0 XOR fp1 XOR fp2"]
    QXor --> QMatch{"res == (hash as Fingerprint)?"}
    QMatch -- "是" --> QPres["可能存在 (Probably)"]
    QMatch -- "否" --> QNot["绝对不存在 (Definitely)"]
  1. 哈希化: 使用 RapidHash 或自定义哈希器对键进行混淆。
  2. 映射: 根据哈希值在分区图中确定三个槽位。
  3. 查找: 通过对这三个槽位的指纹进行 XOR 运算,判断成员身份。

技术堆栈

  • 语言: Rust (Edition 2024)。
  • 核心算法: 二进制分区保险丝图 (Binary-partitioned Fuse Graph) 算法。
  • 哈希算法: RapidHash (基于 rapidhash crate), 高质量混淆函数 mix64
  • 性能评估: Criterion 微基准测试。

目录结构

  • src/: 核心实现。
    • base/: 泛型 Binary Fuse 算法实现 (构建、查询、工具)。
    • bf.rs: 通用包装器 Bf 实现 (支持任意类型与去重)。
    • hash.rs: 哈希器与高质量混淆函数 implementation。
  • benches/: 性能基准测试集。
  • analysis/: 均匀性与零分布分析工具。

API 说明

Trait

  • Filter<T>: 成员检测核心 trait。
    • has(&self, key: &T) -> bool
    • len(&self) -> usize

类型

  • Bf8, Bf16, Bf32: 托管内存的过滤器 (别名 Base<u8>, Base<u16>, Base<u32>)。
  • Bf<T, F, H = RapidHasher>: 通用包装器,使用哈希器 H 与过滤器 F 处理任意类型 T,具备自动去重功能。

总结对比表

过滤器 内存占用 查询速度 构建速度 缓存友好度 场景
Binary Fuse 极低 (≈1.08x 理论下限) 极快 (3 次访问) 极快 (分区优化) 优秀 静态海量数据最佳选择
Xor Filter 旧一代方案
Bloom Filter 慢 (多次哈希) 动态数据/简单场景
Cuckoo Filter 中 (随机探测) 需要支持删除

构建失败概率

Binary Fuse 过滤器构建失败的理论概率极低。本库在构建时会自动进行 1000 次重试(每次使用不同的随机种子)。

根据 Mueller & Lemire 的论文 [2],单次构建的成功率下界为 90%。而基于本库的实测数据(针对 100,000 个随机键进行 1000 轮测试):

  • 一次性构建成功率: 98.70% (1000 次中 987 次首次成功)
  • 平均尝试次数: 1.013

这意味着单次构建的失败率仅为

因此,连续 1000 次构建均失败的概率为:

是一个完全可以视为 0 的数值。 它比宇宙中的原子总数倒数还要小得多,更比现代计算机硬件发生不可纠正错误的概率(约为 ) 低几百个数量级。

Google 在大规模数据中心的研究表明,每年约有 1.3% 的机器经历过至少一次不可纠正的内存错误 (Uncorrectable Error)。假设一次构建过程耗时 0.1 秒,那么在此期间发生硬件错误的概率约为 量级。

事件类型 近似概率 风险定性
构建期间硬件位翻转 真实存在的极低风险
Binary Fuse 构建失败 物理上的“绝对不可能”

因此,库的设计原则是:将构建失败视为不可恢复的致命错误(panic),而非运行时错误(Result/TryFrom)。

如果您在构建过程中遇到了 panic,更有可能是因为:

  1. 输入数据存在重复键(这是最常见的原因,即使您认为已经去重)。
  2. 硬件故障(内存位翻转等)。
  3. 极度罕见的概率事件(此时只需简单的重试即可,但在人类文明的时间尺度内几乎不可能遇到)。

出于易用性考虑,我们优先使用 From trait,因为它符合绝大多数场景下的心理预期:构建过程总是成功的。

历史背景

概率过滤器的技术演进从 Bloom 过滤器 (1970) 开始,历经 Cuckoo 过滤器 (2014) 的改进。2020 年 Xor 过滤器的出现带来了范式转移,通过完美的 XOR 求和实现更优性能。2022 年,Mueller 与 Lemire 进一步提出 Binary Fuse 过滤器,通过图分区技术使其在空间和时间效率上逼近了理论极限。

参考引用


关于

本项目是 js0.site ⋅ 重构互联网计划 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式。欢迎关注我们:

评测

性能基准

过滤器 构建(万ops/s) 查询(万ops/s) 内存占用 对比
jdb Bf8 4959.87 86727.74 116.04 KB 1.65x
xorf BinaryFuse8 3971.65 52493.29 116.04 KB -
jdb Bf16 4809.60 76974.42 232.04 KB 1.56x
xorf BinaryFuse16 4015.96 49222.88 232.04 KB -
jdb Bf32 4676.60 67297.71 464.04 KB 1.39x
xorf BinaryFuse32 4016.81 48393.79 464.04 KB -

准确率

过滤器 假阳率 假阴率
jdb Bf8 0.39196% 0
xorf BinaryFuse8 0.38768% 0
jdb Bf16 0.00132% 0
xorf BinaryFuse16 0.00165% 0
jdb Bf32 0.00000% 0
xorf BinaryFuse32 0.00000% 0

关于

本项目为 js0.site ⋅ 重构互联网计划 的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式,欢迎关注:

Commit count: 1

cargo fmt