ultraloglog

Crates.ioultraloglog
lib.rsultraloglog
version0.1.6
created_at2025-01-31 14:32:10.926166+00
updated_at2025-10-26 07:26:15.352592+00
descriptionRust implementation of the UltraLogLog algorithm
homepage
repositoryhttps://github.com/waynexia/ultraloglog
max_upload_size
id1537567
size109,596
Ruihang Xia (waynexia)

documentation

README

UltraLogLog

Crates.io PyPI Documentation

Rust implementation of the UltraLogLog algorithm. Ultraloglog is more space efficient than the widely used HyperLogLog, but can be slower. FGRA estimator or MLE estimator can be used.

Usage

use ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};

let mut ull = UltraLogLog::new(6).unwrap();

ull.add_value("apple")
    .add_value("banana")
    .add_value("cherry")
    .add_value("033");
let est = ull.get_distinct_count_estimate();

The serde feature can be activated so that the sketch can be saved to disk and then loaded.

use ultraloglog::{Estimator, MaximumLikelihoodEstimator, OptimalFGRAEstimator, UltraLogLog};
use std::fs::{remove_file, File};
use std::io::{BufReader, BufWriter};

let file_path = "test_ultraloglog.bin";

// Create UltraLogLog and add data
let mut ull = UltraLogLog::new(5).expect("Failed to create ULL");
ull.add(123456789);
ull.add(987654321);
let original_estimate = ull.get_distinct_count_estimate();

// Save to file using writer
let file = File::create(file_path).expect("Failed to create file");
let writer = BufWriter::new(file);
ull.save(writer).expect("Failed to save UltraLogLog");

// Load from file using reader
let file = File::open(file_path).expect("Failed to open file");
let reader = BufReader::new(file);
let loaded_ull = UltraLogLog::load(reader).expect("Failed to load UltraLogLog");
let loaded_estimate = loaded_ull.get_distinct_count_estimate();

Python Bindings

This crate also provides Python bindings for the UltraLogLog algorithm using PyO3. See example.py for usage.

import ultraloglog

# Create a new UltraLogLog sketch
ull = ultraloglog.PyUltraLogLog(12)  # precision parameter

# Add values
ull.add_str("hello")
ull.add_int(42)
ull.add_float(3.14)

# Get estimated count
print(f"Estimated distinct count: {ull.count()}")

Installation

Using pip

This package is available as ultraloglog in PyPI. You can install it using:

pip install ultraloglog

From Source

uv is recommended to manage virtual environments.

  1. Install Rust, and maturin pip install maturin

  2. Build and install: maturin develop --release

64-bit hash function

As mentioned in the paper, high quality 64-bit hash function is key to ultraloglog algorithm. We tested several modern 64-bit hash libraries and found that xxhash-rust (default) and wyhash-rs worked well. However, users can easily replace the default xxhash-rust with polymurhash, komihash, ahash and t1ha et.al. See testing section for details.

Reference

Ertl, O., 2024. UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting. Proceedings of the VLDB Endowment, 17(7), pp.1655-1668.

Commit count: 14

cargo fmt