Crates.io | fast-stats |
lib.rs | fast-stats |
version | 0.1.1 |
source | src |
created_at | 2023-03-17 10:22:43.470977 |
updated_at | 2023-05-08 08:42:11.582209 |
description | Rust library for efficient calculation of statistics from streaming data |
homepage | |
repository | https://github.com/jwallbridge/fast-stats |
max_upload_size | |
id | 812606 |
size | 35,597 |
A Rust library for doing statistical analysis on streaming data.
Taking inspiration from the respective nodejs library https://github.com/bluesmoon/node-faststats, fast-stats is a Rust library allowing the calculation of various statistical quantities with an emphasis on speed. This library also expands on the above by allowing various transformations of the streaming data.
In undertaking statistical analysis on streaming data, fast-stats attempts to leverage two important factors to improve execution time:
This is achieved by maintaining a running cache of several variables as data is streamed into the structure.
As an example, the extraordinary increase in speed in calculating the standard deviation is a result of the observation that the variance can be calculated using a simple formula based on an incremental sum of values and an incremental sum of squares of those values.
This is particularly useful in applications that require the calculation of statistical quantities with respect to a contracting, expanding or fixed window of data from a continuously updating stream.
The only trade-off is a small amount of additional memory usage to store these running values.
The Stats
struct functions like a vector of data which maintains a running update of various values used in the calculation of statistical quantities. Data is added and deleted from this object using push
, pop
and other vector methods listed in the Functionality section below.
use fast_stats::fstats_f64;
let v = Stats::new();
v.push_vec(vec![4.0, -1.0, 3.0]);
println!("{}", v.mean());
// 2.0
In addition to fstats_f64
, we have included fstats_float
which is generic in the float type at the expense of a tiny amount of speed.
The following statistical methods are supported:
mean
stddev
min
max
..
The following methods are currently supported and offer the same functionality (with the exception of drain) to their corresponding method in the standard library:
append
drain
insert
is_empty
len
pop
push
push_vec
remove
resize
splice
split_off
swap_remove
truncate
..
New methods not in the standard library include:
trim
which shortens a vector by removing all elements up to a given index.
The reset
method clears out all data.
The data
method returns a vector of the underlying data.