Crates.io | wavelet-matrix |
lib.rs | wavelet-matrix |
version | 0.4.7 |
source | src |
created_at | 2017-10-29 17:04:55.193378 |
updated_at | 2017-12-21 11:56:45.302736 |
description | A wavelet matrix implementation. Supports various near-O(1) queries on large number of symbols or integers. |
homepage | |
repository | https://github.com/sekineh/wavelet-matrix-rs |
max_upload_size | |
id | 37359 |
size | 174,230 |
It provides the various analytics on very large sequence of unsigned integers in constant time.
After adding to Cargo.toml, add this line to lib.rs or main.rs.
extern crate wavelet_matrix;
See crate document top for further examples.
Overall Performance
Error evaluation of O(1) SUM
Given an unsigned integer sequence T, it provides the following queries.
.len()
:
T
..lookup(pos)
:
T[pos]
.Counting is performed in O(1).
.count(start..end, value)
:
e
which satisfies e == value
included in T[start..end]
.count_prefix(start..end, value, ignore_bit)
:
e
which satisfies e >> ignore_bit == value >> ignore_bit
included in T[start..end]
192.168.10.0/24
. In this case, the ignore_bit will be 8..count_lt(start..end, value)
:
e
which satisfies e < value
included in T[start..end]
.count_gt(start..end, value)
:
e
which satisfies e > value
included in T[start..end]
.count_range(start..end, val_start..val_end)
:
e
which satisfies val_start <= e < val_end
included in T[start..end]
Searching is performed in O(1) per a next index.
.search(start..end, value)
:
e
which satisfies e == value
included in T[start..end]
.search_prefix(start..end, value, ignore_bit)
:
e
which satisfies e >> ignore_bit == value >> ignore_bit
included in T[start..end]
Ranking is performed in roughly O(1) with regard to the number of elements n
.
.max_k(start..end, val_start..val_end, k)
:
(value, count)
pairs in descending order.val_start..val_end
..min_k(start..end, val_start..val_end, k)
:
(value, count)
pairs in ascending order.val_start..val_end
..top_k()
is also performed in O(1) in best case, but may take O(n) in the worst case where every value occurs only once!
.top_k(start..end, val_start..val_end, k)
:
(value, count)
pairs in most-frequent-one-first order.val_start..val_end
.To achieve O(1) performance regardless of the number of unique values, use .top_k_ranges()
instead:
.top_k_ranges(start..end, val_start..val_end, k)
:
(v_start..v_end, count)
pairs in most-frequent-one-first order..top_k()
, .top_k_ranges()
returns the exhaustive range set that covers all of the values.val_start..val_end
..median(start..end)
:
T[start..end]
..quantile(start..end, start + (end - start) / 2)
..quantile(start..end, k)
:
T[start..end]
.These methods use .top_k_ranges()
to enumerate the most relevant value ranges.
They are not as accurate as the method used in Experiment 2.
.sum_experiment1(start..end, val_start..val_end, k)
:
T[start..end]
using up to k
wavelet tree nodes.val_start..val_end
are processed.k = m + 1
where m
is the number of values which are unique..mean_experiment1(start..end, val_start..val_end, k)
:
T[start..end]
using up to k
wavelet tree nodes.val_start..val_end
are processed.k = m + 1
where m
is the number of values which are unique..variance_experiment1(start..end, val_start..val_end, k)
:
T[start..end]
using up to k
wavelet tree nodes.val_start..val_end
are processed.k = m + 1
where m
is the number of values which are unique.Improvement over Experiment 1. They use custom node enumerator to minimize the error.
.sum_experiment2(start..end, val_start..val_end, k)
:Improvement over Experiment 2. They use Range<u64>
to tell how accurate the computed value is.
.sum_experiment3(start..end, val_start..val_end, k)
:.rank(pos, value)
: counts value included in T[0..pos]
.
.rank()
always returns 0..select(rank, value)
: return the position of the (rank+1)-th value
.len()
instead of None..median()
and .quantile()
..median()
and .quantile()
. They are quite fast, only take 3-5 us on 16-bit values..top_k_ranges()
which is faster than .top_k()
in worst case..sum_experiment1()
, .mean_experiment1()
and .variance_experiment1()
..sum_experiment2()
..sum_experiment3()
.BENCH.md
bench report.BENCH_SUM.md
bench report..bit_len()
..dim()
..top_k()
, .max_k()
and min_k()
..top_k()
, .max_k()
and min_k()
..search()
and .search_prefix()
..count()
, .count_prefix()
, .count_lt()
, .count_gt()
and .count_range()
.&Vec<u64>
, instead of Vec<u64>
Add Benchmark.
Profiling
Optimize underlying rsdic structure.
Add travis CI.
Add u128 support or arbitrary-length integer support
The fastest implementation on the planet
A Go package for myriad array operations using wavelet trees
excellent slides in Japanese
The original inventor's pdf: