| Crates.io | nano-hyperloglog |
| lib.rs | nano-hyperloglog |
| version | 0.1.0 |
| created_at | 2025-10-03 21:10:54.165587+00 |
| updated_at | 2025-10-03 21:10:54.165587+00 |
| description | A Redis-compatible HyperLogLog service with pluggable storage backends |
| homepage | https://github.com/aovestdipaperino/nano-hyperloglog |
| repository | https://github.com/aovestdipaperino/nano-hyperloglog |
| max_upload_size | |
| id | 1867202 |
| size | 1,615,068 |
A high-performance HyperLogLog implementation in Rust for cardinality estimation with pluggable storage backends.
HyperLogLog is a probabilistic data structure that estimates the cardinality (number of unique elements) of a dataset. It's incredibly space-efficient: count billions of unique items using just kilobytes of memory, with typical accuracy within 2% of the true count.
Use cases:
Add to your Cargo.toml:
[dependencies]
nano-hyperloglog = "0.1"
use hyperloglog::HyperLogLog;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create with precision 14 (16KB memory, ~0.8% error)
let mut hll = HyperLogLog::new(14)?;
// Add elements
for i in 0..100_000 {
hll.add(&i);
}
// Get estimated count
let count = hll.count();
println!("Estimated: {} (actual: 100,000)", count);
// Output: Estimated: 99,723 (actual: 100,000) - 0.28% error!
Ok(())
}
Perfect for distributed systems:
use hyperloglog::HyperLogLog;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Three servers tracking visitors
let mut server1 = HyperLogLog::new(14)?;
let mut server2 = HyperLogLog::new(14)?;
let mut server3 = HyperLogLog::new(14)?;
// Each sees different (potentially overlapping) users
for i in 0..5000 { server1.add(&format!("user_{}", i)); }
for i in 2500..7500 { server2.add(&format!("user_{}", i)); }
for i in 5000..10000 { server3.add(&format!("user_{}", i)); }
// Merge to get total unique users
server1.merge(&server2)?;
server1.merge(&server3)?;
println!("Total unique users: {}", server1.count());
// Output: Total unique users: ~10,000
Ok(())
}
use hyperloglog::{HyperLogLog, Storage, storage::FileStorage};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let storage = FileStorage::new("./data").await?;
// Create and populate
let mut hll = HyperLogLog::new(14)?;
hll.add_str("user123");
hll.add_str("user456");
// Persist to disk
storage.store("daily_visitors", &hll).await?;
// Load later
let loaded = storage.load("daily_visitors").await?;
println!("Count: {}", loaded.count());
Ok(())
}
| Precision | Memory | Standard Error | Use Case |
|---|---|---|---|
| 10 | 1 KB | ±1.625% | Quick estimates, tight memory |
| 12 | 4 KB | ±0.813% | Good balance |
| 14 | 16 KB | ±0.406% | Default - recommended |
| 16 | 64 KB | ±0.203% | High accuracy needed |
Control what gets compiled:
[dependencies]
nano-hyperloglog = { version = "0.1", features = ["elasticsearch-storage", "server"] }
Available features:
file-storage (default) - File-based persistenceelasticsearch-storage - Elasticsearch backendserver - HTTP server with Redis-compatible APIfull - EverythingRun a Redis-compatible HTTP service:
cargo run --example server --features server
# Add elements (PFADD)
curl -X POST http://localhost:3000/pfadd/daily_visitors \
-H "Content-Type: application/json" \
-d '{"elements": ["user123", "user456", "user789"]}'
# Get count (PFCOUNT)
curl http://localhost:3000/pfcount/daily_visitors
# {"count": 3}
# Merge multiple HLLs (PFMERGE)
curl -X POST http://localhost:3000/pfmerge/all_visitors \
-H "Content-Type: application/json" \
-d '{"source_keys": ["page_home", "page_about"]}'
# Check existence
curl http://localhost:3000/exists/daily_visitors
# true
# List all keys
curl http://localhost:3000/keys
# ["daily_visitors", "all_visitors"]
Via environment variables:
# Storage backend
STORAGE_BACKEND=file # or "elasticsearch"
FILE_STORAGE_PATH=./data # for file backend
ELASTICSEARCH_URL=http://localhost:9200
ELASTICSEARCH_INDEX=hyperloglog
# Server
BIND_ADDRESS=0.0.0.0:3000
cargo run --example server --features server
Run examples to see it in action:
# Basic counting
cargo run --example basic_usage
# Merging from multiple sources
cargo run --example merging
# File storage
cargo run --example file_storage --features file-storage
# Compare precision values
cargo run --example precision_comparison
# HTTP server
cargo run --example server --features server
HyperLogLog uses a clever trick based on probability theory:
For technical details, see the algorithm explanation.
Benchmarks on M1 MacBook Pro:
| Operation | Time | Throughput |
|---|---|---|
| Add element | ~15 ns | 66M ops/sec |
| Count (precision 14) | ~8 μs | 125K ops/sec |
| Merge (16KB each) | ~12 μs | 83K ops/sec |
| Serialize to JSON | ~50 μs | 20K ops/sec |
Memory usage is fixed: 2^precision bytes (16KB for precision 14).
| Feature | nano-hyperloglog | Redis PFCOUNT |
|---|---|---|
| Precision configurable | ✅ 4-16 bits | ✅ 14 bits |
| Persistent storage | ✅ File/ES | ✅ RDB/AOF |
| HTTP API | ✅ Optional | ❌ |
| Type-safe API | ✅ | ❌ |
| Startup time | < 10ms | ~100ms |
| Memory footprint | ~8MB binary | ~10MB process |
When to use each:
Contributions welcome! Please:
cargo fmt and cargo clippyLicensed under either of:
at your option.
Inspired by Redis's PFCOUNT and the excellent work of Philippe Flajolet on probabilistic counting algorithms.