| Crates.io | rust-hll |
| lib.rs | rust-hll |
| version | 0.1.0 |
| created_at | 2025-06-02 20:48:59.1857+00 |
| updated_at | 2025-06-02 20:48:59.1857+00 |
| description | A Rust implementation of HLL that is compatible with the Aggregate Knowledge HLL Storage Spec |
| homepage | |
| repository | https://github.com/zach-schoenberger/rust-hll |
| max_upload_size | |
| id | 1698365 |
| size | 7,674,862 |
A Rust implementation of HyperLogLog that is storage-compatible with the Aggregate Knowledge HLL Storage Spec. It is heavily based on the work done in go-hll.
HyperLogLog (HLL) is a fixed-size, set-like structure used for distinct value counting with tunable precision. For example, in 1280 bytes HLL can estimate the count of tens of billions of distinct values with only a few percent error.
In addition to the algorithm proposed in the original paper, this implementation is augmented to improve its accuracy and memory use without sacrificing much speed.
While there are a handful of existing HLL implementations in Rust, none of them (that I have found) implement the AK Storage Spec.
The unified storage format is useful for reading and writing HLLs in a multi-lingual environment. This implementation
allows for seamless integration with other HLL implementations that follow the AK Storage Spec, such as the PostgreSQL HLL
extension.
A good hashing algorithm is crucial to achieving the pseudorandomness that HLL requires in order to perform its calculations. The 64-bit variant of MurmurHash3 is recommended. If using a seed, it must be constant for all inputs to a given HLL. Further, if that HLL is to be unioned, then the same seed must be used for all inputs to the other HLL.
use rust_hll::{Hll, Settings};
fn main() {
// Create settings for the HLL
let settings = Settings::new(
10, // log_2m: number of registers will be 2^10
4, // reg_width: 4 bits per register
-1, // explicit_threshold: auto-calculate threshold
true, // sparse_enabled: use sparse representation
)
.unwrap();
// Create a new HLL with the settings
let mut hll = Hll::new(settings);
// Add elements
hll.add_raw(123456789);
println!("Cardinality: {}", hll.cardinality()); // prints "1"
// Create another HLL and add elements
let mut hll2 = Hll::new(settings);
hll2.add_raw(123456789);
hll2.add_raw(987654321);
// Union HLLs
hll2.union(true, &hll).unwrap();
println!("Cardinality after union: {}", hll2.cardinality()); // prints "2"
// Serialize to bytes
let bytes = hll2.to_bytes();
// Deserialize from bytes
let hll3 = Hll::from_bytes(&bytes).unwrap();
println!("Cardinality after deserialization: {}", hll3.cardinality()); // prints "2"
}
The HLL implementation can be configured through the Settings struct:
log_2m: Determines the number of registers in the HLL (2^log_2m). Must be between 4 and 31.reg_width: Number of bits dedicated to each register value. Must be between 1 and 8.explicit_threshold: Cardinality at which the HLL transitions from explicit to probabilistic storage. Use -1 for auto-calculation.sparse_enabled: Whether to use sparse representation. When true, conversion thresholds are automatically calculated.The implementation uses three storage types that automatically transition based on the data:
Released under the MIT license.