Crates.io | remedian |
lib.rs | remedian |
version | 0.1.0 |
source | src |
created_at | 2024-10-10 04:17:34.149383 |
updated_at | 2024-10-10 04:17:34.149383 |
description | A Rust implementation of The Remedian |
homepage | |
repository | https://github.com/sixfold-origami/remedian |
max_upload_size | |
id | 1403439 |
size | 131,081 |
Remedian is a Rust implementation of The Remedian, a robust method to approximate the median of a large dataset, without needing to load the entire thing in memory. This is desirable in cases where the dataset is so large that loading the whole thing simultaneously is intractable.
Basic usage:
// The default block is configured with a reasonable b and k for most applications
let mut remedian = RemedianBlock::default();
// Read data points from our data source, and fold them into the remedian
for data_point in some_data_stream {
remedian.add_sample_point(data_point);
}
// Get our (approximate) answer
let median = remedian.median();
For more details, check out examples/minimal.rs
, examples/full.rs
, or examples/custom_data.rs
The Remedian algorithm is quite simple in concept.
It stores k
arrays of size b
, (where k
and b
are hyperparameters).
Here, we see an example with 4 arrays of size 11:
Figure 1 from the original paper
As points are read in from the stream, each array is filled in turn:
In this way, The Remedian can account for b^k
sample points, while only using b*k
space.
This crate has a single feature, which is enabled by default
eprintln!
is used insteadA sample file of 2000 randomly generated numbers can be found in test_data/2000_values.txt
.
The values are uniformly distributed from 1 to 1000, and are used in tests to ensure accuracy.