bloomz

Crates.iobloomz
lib.rsbloomz
version0.1.0
created_at2025-08-10 11:59:53.515245+00
updated_at2025-08-10 11:59:53.515245+00
descriptionA fast, flexible Bloom filter library for Rust with parallel operations support
homepagehttps://github.com/pixperk/bloomz
repositoryhttps://github.com/pixperk/bloomz
max_upload_size
id1788840
size66,996
Yashaswi Kumar Mishra (pixperk)

documentation

https://docs.rs/bloomz

README

🌸 Bloomz

Fast, flexible Bloom filter for Rust with pluggable hashers and parallel operations.

Crates.io Documentation License

Features

  • Fast: Optimized bit operations with efficient double hashing
  • Flexible: Pluggable hash builders (SipHash, AHash, xxHash, etc.)
  • Parallel: Batch operations with Rayon for multi-core performance
  • Serializable: JSON and binary serialization with Serde
  • Safe: No unsafe code, extensive testing

Quick Start

Add to your Cargo.toml:

[dependencies]
bloomz = "0.1"

# Enable optional features
bloomz = { version = "0.1", features = ["serde", "rayon"] }

Basic Usage

use bloomz::BloomFilter;

// Create a filter for ~1000 items with 1% false positive rate
let mut filter = BloomFilter::new_for_capacity(1000, 0.01);

// Insert items
filter.insert(&"hello");
filter.insert(&42);

// Check membership
assert!(filter.contains(&"hello"));
assert!(!filter.contains(&"world"));

Parallel Operations (with rayon feature)

use bloomz::BloomFilter;
use rayon::prelude::*;
use std::collections::hash_map::RandomState;

let rs = RandomState::new();
let mut filter = BloomFilter::with_hasher(10000, 7, rs);

// Parallel batch insert
let items: Vec<i32> = (0..1000).collect();
filter.insert_batch(items.par_iter().cloned());

// Parallel batch contains
let test_items: Vec<i32> = (500..600).collect();
let results = filter.contains_batch(test_items.par_iter().cloned());

// Check if all items are present
let all_present = filter.contains_all(test_items.par_iter().cloned());

Serialization (with serde feature)

use bloomz::BloomFilter;

let mut filter = BloomFilter::new_for_capacity(100, 0.01);
filter.insert(&"data");

// JSON serialization
let json = serde_json::to_string(&filter)?;
let restored: BloomFilter = serde_json::from_str(&json)?;

// Binary serialization  
let bytes = filter.to_bytes();
let restored = BloomFilter::from_bytes(&bytes).unwrap();

Custom Hash Builders

use bloomz::BloomFilter;
use std::collections::hash_map::RandomState;

// Default SipHash (secure)
let filter1 = BloomFilter::new(1000, 5);

// Custom RandomState
let rs = RandomState::new();
let filter2 = BloomFilter::with_hasher(1000, 5, rs);

// Fast hashers (requires feature flags)
#[cfg(feature = "fast-ahash")]
{
    use ahash::AHasher;
    let filter3 = BloomFilter::with_hasher(1000, 5, 
        ahash::RandomState::new());
}

Performance

Bloomz uses several optimizations:

  • Double Hashing: Generate k hash functions from just 2 base hashes
  • Efficient Bit Operations: Word-aligned bit manipulation with u64
  • Parallel Processing: Multi-threaded batch operations with Rayon
  • Zero-Copy Serialization: Direct bit vector serialization

Benchmarks

Run benchmarks to compare hashers and parallel vs sequential operations:

# Compare different hash builders
cargo bench --features "fast-ahash,fast-xxh3" bloom_hashers

# Compare parallel vs sequential operations  
cargo bench --features rayon parallel_bloom

API Reference

Core Types

  • BloomFilter<S> - Main bloom filter with hasher type S
  • BitSet - Underlying bit storage with optimized operations

Key Methods

Insertion

  • insert(&item) - Insert a single item
  • insert_batch(items) - Parallel batch insert (rayon feature)

Membership

  • contains(&item) - Check if item is probably in set
  • contains_batch(items) - Parallel batch check (rayon feature)
  • contains_all(items) - Check if all items are present (rayon feature)

Set Operations

  • union_inplace(&other) - Merge with another filter
  • intersect_inplace(&other) - Keep only common elements
  • clear() - Remove all items

Serialization

  • to_bytes() / from_bytes() - Binary format
  • Serde support for JSON/other formats

Mathematical Functions

use bloomz::math;

// Calculate optimal parameters
let m = math::optimal_m(n_items, false_positive_rate);
let k = math::optimal_k(m, n_items);

let filter = BloomFilter::new(m, k);

Feature Flags

Feature Description Dependencies
serde JSON/binary serialization serde, serde_json
rayon Parallel batch operations rayon
fast-ahash AHash hasher support ahash
fast-xxh3 xxHash hasher support xxhash-rust

Examples

See src/main.rs for a complete web crawler URL filter demo:

# Basic demo
cargo run

# With all features
cargo run --features "rayon,serde,fast-ahash"

Use Cases

  • Web Crawlers: Avoid revisiting URLs
  • Caching: Quick "not in cache" checks
  • Databases: Reduce disk lookups
  • Networking: Packet deduplication
  • Analytics: Unique visitor tracking

Contributing

Contributions welcome! Please check:

  • Run cargo test --all-features
  • Run cargo bench --all-features
  • Add tests for new features
  • Update documentation

License

MIT License - see LICENSE file.


🌸 Bloomz: Where speed meets flexibility in Rust Bloom filters!

Commit count: 0

cargo fmt