zw-fast-quantile

Crates.iozw-fast-quantile
lib.rszw-fast-quantile
version0.2.3
sourcesrc
created_at2021-11-14 22:49:17.021854
updated_at2021-12-30 00:25:06.120327
descriptionZhang-Wang fast quantile algorithm in Rust
homepage
repositoryhttps://github.com/MnO2/zw-fast-quantile
max_upload_size
id481894
size42,651
Paul Meng (MnO2)

documentation

README

zw-fast-quantile

CI Crates.io

Zhang Wang Fast Approximate Quantiles Algorithm in Rust: link to reference. It's the original algorithm inspired the techniques in Tensorflow Boosted Trees

Installation

Add this to your Cargo.toml:

[dependencies]
zw-fast-quantile = "0.2"

then you are good to go. If you are using Rust 2015 you have to extern crate zw-fast-quantile to your crate root as well.

Example

There are two implementations in this library: FixedSizeEpsilonSummary is with the size of the stream known before hand, and UnboundEpsilonSummary is for the stream with unknown size. You can adjust the error rate epsilon of your own to trade-off between space and accuracy.

let epsilon = 0.1;
let n = 10;
let mut s = FixedSizeEpsilonSummary::new(n, epsilon);
for i in 1..=n {
    s.update(i);
}

let ans = s.query(0.0);
let expected = 1;
assert!(expected == ans);
let epsilon = 0.1;
let n = 10;
let mut s = UnboundEpsilonSummary::new(epsilon);
for i in 1..=n {
    s.update(i);
}

let ans = s.query(0.0);
let expected = 1;
assert!(expected == ans);

Benchmarks

We benchmark against GK01 implemented in quantiles. To test the UPDATE, we insert 5000 values in order with error rate of 0.01. GK01 is roughly 2.6x slower than ZW.

zw unbound quantile update
                        time:   [60.780 us 60.855 us 60.936 us]
                        change: [-1.4032% -0.9510% -0.5005%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe
gk quantile update      time:   [156.84 us 157.02 us 157.24 us]
                        change: [-0.1907% -0.0503% +0.0969%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

As for the query time, we query 11 quantiles from 0.0, 0.1 ... 1.0. GK01 is about 1.5x slower than ZW.

zw unbound quantile query
                        time:   [229.62 ns 230.16 ns 230.77 ns]
                        change: [+1.3422% +1.8105% +2.2504%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
gk quantile query       time:   [350.21 ns 350.48 ns 350.76 ns]
                        change: [-0.4638% -0.3109% -0.1670%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
  5 (5.00%) high severe
Commit count: 63

cargo fmt