bitcoinleveldb-histogram

Crates.io	bitcoinleveldb-histogram
lib.rs	bitcoinleveldb-histogram
version	0.1.19
created_at	2023-01-18 19:19:03.761026+00
updated_at	2025-12-01 17:01:27.578806+00
description	LevelDB-compatible fixed-bucket histogram for fast online statistics (min, max, average, percentiles, standard deviation) and mergeable distributions, used by bitcoinleveldb within bitcoin-rs.
homepage
repository	https://github.com/klebs6/bitcoin-rs
max_upload_size
id	761961
size	202,810

(klebs6)

documentation

https://docs.rs/bitcoinleveldb-histogram

README

bitcoinleveldb-histogram

Fast, deterministic histogram statistics for Bitcoin-leveldb style latency and value distributions.

This crate provides a faithful Rust implementation of the histogram used inside LevelDB, adapted for the bitcoinleveldb subsystem of bitcoin-rs. It is designed for profiling and characterizing distributions (e.g., operation latencies, sizes, or fees) with negligible runtime overhead and a compact internal representation.

Features

Fixed-bucket histogram with HISTOGRAM_NUM_BUCKETS logarithmically (LevelDB-style) spaced bucket limits BUCKET_LIMIT.
Online accumulation of statistics:
- min, max
- sample count num
- linear sum sum
- quadratic sum sum_squares
Derived statistics without storing raw samples:
- arithmetic mean (average)
- standard deviation
- median
- arbitrary percentile
Deterministic merge of histograms without re-processing samples.
Human-readable ASCII representation suitable for logs.
no_std-friendly core (uses core::fmt::Write), with logging controlled by the caller's logging setup.

The design is suitable for high-throughput systems where allocating or storing all samples is undesirable. The histogram approximates the distribution using bucket counts and pre-defined bucket limits, providing excellent tradeoffs between precision, memory, and CPU cost.

Data model

pub struct Histogram {
    min:         f64,
    max:         f64,
    num:         f64,
    sum:         f64,
    sum_squares: f64,
    buckets:     [f64; HISTOGRAM_NUM_BUCKETS],
}

Semantics:

min, max: Smallest and largest sample ever added. When empty, min is initialized to the largest bucket limit and max is 0.0.
num: Total number of samples (as f64 to match LevelDB and simplify arithmetic with large counts).
sum: Sum of all values.
sum_squares: Sum of squares of all values, enabling variance computation without storing samples.
buckets: Per-bucket counts; bucket boundaries are provided by BUCKET_LIMIT and follow LevelDB.

This is an online algorithm: each add call updates all sufficient statistics in O(number_of_buckets) worst-case for bucketing (linear scan) but typically small constant time, and O(1) for the other aggregates.

The standard deviation implementation uses the identity:

[ \operatorname{Var}(X) = E[X^2] - (E[X])^2 ]

with

[ E[X] = \frac{\text{sum}}{\text{num}},\quad E[X^2] = \frac{\text{sum_squares}}{\text{num}} ]

and guarded against small negative values from floating-point error.

Basic usage

Add this crate to your Cargo.toml:

[dependencies]
bitcoinleveldb-histogram = "0.1.19"

Create a histogram and feed samples:

use bitcoinleveldb_histogram::Histogram;

fn main() {
    let mut h = Histogram::default();

    // Simulate some latency / cost data
    for value in [1.0, 5.0, 10.0, 10.0, 100.0] {
        h.add(value);
    }

    println!("count = {}", h.num);
    println!("min   = {}", h.min);
    println!("max   = {}", h.max);
    println!("avg   = {}", h.average());
    println!("p50   = {}", h.median());
    println!("p99   = {}", h.percentile(99.0));
    println!("std   = {}", h.standard_deviation());

    // Pretty ASCII representation, similar to LevelDB output
    println!("{}", h.to_string());
}

Example output (shape only, exact values depend on bucket configuration):

Count: 5  Average: 25.2000  StdDev: 38.42
Min: 1.0000  Median: 10.0000  Max: 100.0000
------------------------------------------------------
[       0,       1 )       0   0.000%   0.000% 
[       1,      10 )       2  40.000%  40.000% ########
[      10,     100 )       2  40.000%  80.000% ########
[     100,    1000 )       1  20.000% 100.000% ####

Operations

Construction and reset

impl Default for Histogram {
    fn default() -> Self;
}

impl Histogram {
    pub fn clear(&mut self);
}

Histogram::default() initializes an empty histogram: internal counters zeroed, min set to the largest bucket limit, max to 0.0.
clear() resets the histogram to this empty state, retaining the allocated bucket array.

Adding samples

impl Histogram {
    pub fn add(&mut self, value: f64);
}

Performs a linear search over BUCKET_LIMIT to locate the bucket index b whose range contains value.
Increments buckets[b].
Updates min, max, num, sum, and sum_squares.

The linear search mirrors LevelDB: the bucket count is small and the branch-predictable loop is faster than a generic binary search for the typical usage patterns.

Merging histograms

impl Histogram {
    pub fn merge(&mut self, other: &Histogram);
}

Component-wise aggregation of sufficient statistics:
- num += other.num
- sum += other.sum
- sum_squares += other.sum_squares
- per-bucket addition: self.buckets[i] += other.buckets[i]
min/max combined as the global extrema across both histograms.
If other.num == 0.0, merge is a no-op.

This makes histogram aggregation trivial in multi-threaded systems: each worker thread can maintain its own Histogram, then a coordinator merges them at the end, approximating the global distribution with no raw data transfer.

Descriptive statistics

impl Histogram {
    pub fn median(&self) -> f64;
    pub fn percentile(&self, p: f64) -> f64;
    pub fn average(&self) -> f64;
    pub fn standard_deviation(&self) -> f64;
}

Average

If num == 0.0, returns 0.0.
Otherwise sum / num.

Standard deviation

If num == 0.0, returns 0.0.
Uses the E[X^2] - (E[X])^2 identity (see above).
Clamps negative variance from floating-point error to 0.0 before sqrt.

Mathematically, this computes the population standard deviation, not the Bessel-corrected sample standard deviation.

Percentile & median

median() is a shorthand for percentile(50.0).
percentile(p):
- Returns 0.0 if num <= 0.0.
- Clamps p to [0.0, 100.0].
- Finds the first bucket where the cumulative count crosses threshold = num * (p / 100.0).
- Linearly interpolates inside that bucket based on position within its count range.
- Clamps the interpolated value to [min, max] to avoid leaking outside the observed range.
- If no bucket crosses the threshold due to numerical edge cases, falls back to max.

Thus percentiles are approximated from the histogram. When bucket spacing is logarithmic, lower percentiles are more precise for small values; higher percentiles focus resolution where buckets are dense.

String representation

impl Histogram {
    pub fn to_string(&self) -> String;
}

to_string() produces a compact text summary plus an ASCII-art visualization:

Header line with Count, Average, and StdDev.
Second line with Min, Median, and Max.
A row per non-empty bucket:
- Half-open interval [left, right).
- Absolute count in that bucket.
- Bucket-local percentage of total.
- Cumulative percentage up to and including that bucket.
- A bar of # characters (20 total marks correspond to 100% of the distribution in a single bucket).

This representation is intended for logs and interactive diagnostics.

Logging behavior

The crate uses logging macros (e.g., trace!, debug!, warn!) to expose internal events and edge cases:

trace! at entry/exit of methods, and for detailed computation steps.
debug! for noteworthy but non-fatal situations (e.g., percentile on empty histogram).
warn! if a percentile outside [0,100] is requested.

Exact logging backend configuration is left to the embedding application; if no logger is configured, logs will be ignored as usual.

Integration examples

Profiling latency in a storage engine

use bitcoinleveldb_histogram::Histogram;
use std::time::{Duration, Instant};

fn profile_ops<F: Fn()>(iterations: usize, op: F) -> Histogram {
    let mut h = Histogram::default();

    for _ in 0..iterations {
        let start = Instant::now();
        op();
        let elapsed = start.elapsed();
        let micros = elapsed.as_secs_f64() * 1_000_000.0;
        h.add(micros);
    }

    h
}

fn main() {
    let histogram = profile_ops(10_000, || {
        // your database or network operation here
    });

    println!("Latency distribution (µs):\n{}", histogram.to_string());
}

Aggregating histograms across threads

use bitcoinleveldb_histogram::Histogram;
use std::thread;

fn main() {
    let threads = 4;
    let per_thread = 1000;

    let handles: Vec<_> = (0..threads)
        .map(|_| {
            thread::spawn(move || {
                let mut h = Histogram::default();
                for i in 0..per_thread {
                    h.add((i % 100) as f64);
                }
                h
            })
        })
        .collect();

    let mut global = Histogram::default();

    for handle in handles {
        let local = handle.join().expect("thread panicked");
        global.merge(&local);
    }

    println!("Global distribution:\n{}", global.to_string());
}

Relationship to LevelDB and bitcoin-rs

This crate mirrors the histogram design from LevelDB, making it straightforward to:

Port profiling and diagnostic tooling from C++ LevelDB to Rust.
Interpret distributions in bitcoin-rs using the same conventions as upstream components.

If you are interacting directly with LevelDB or with systems already using its histogram output, this crate should produce comparable behavior and text representations, subject to differences in bucket configuration.

License

This crate is distributed under the MIT license.

For source code, issues, and contributions, see the parent repository:

Repository: https://github.com/klebs6/bitcoin-rs

Commit count: 0

bitcoinleveldb-histogram

documentation

README

bitcoinleveldb-histogram

Features

Data model

Basic usage

Operations

Construction and reset

Adding samples

Merging histograms

Descriptive statistics

Average

Standard deviation

Percentile & median

String representation

Logging behavior

Integration examples

Profiling latency in a storage engine

Aggregating histograms across threads

Relationship to LevelDB and bitcoin-rs

License

cargo fmt