Crates.io | staged-sg-filter |
lib.rs | staged-sg-filter |
version | 0.2.3 |
source | src |
created_at | 2024-06-06 05:04:42.349783 |
updated_at | 2024-07-22 19:07:59.292254 |
description | A staged programming implementation for Savitzky-Golay filters. Loops go brrr. |
homepage | |
repository | https://github.com/miguelraz/staged-sg-filter |
max_upload_size | |
id | 1263412 |
size | 342,647 |
A Savitzky-Golar filter that is fast, baby.
All (N,M) parameters are precomputed and pulled in at compile time.
rayon
support is available via a rayon
feature flag
Still some SIMD perf left on the table - newer versions will focus on perf
Remember to compile this with RUSTFLAGS="-C target-cpu=native"
.
This code is based on another code I adapted in Julia with much help from others, see StagedFilters.jl.
The other savgol-rs
implementation offers this speed:
use savgol_rs::*;
fn main() {
let input = SavGolInput {
data: &vec![10.0; 500_000],
window_length: 3,
poly_order: 1,
derivative: 0,
};
let result = savgol_filter(&input);
let data = result.unwrap();
println!("{:?}", &data[0..10]);
}
takes about 52s
, whereas this crate
use staged_sg_filter::sav_gol;
fn main() {
let n = 100_000_000;
let v = vec![10.0; n];
let mut buf = vec![0.0; n];
sav_gol::<1, 1>(&mut buf, &v);
println!("{:?}", &buf[0..10]);
}
runs in about 200ms
in 20x the data size. We're roughly churning through about 100_000_000/0.2 ≈ 5e8
elements per second or 5e8 * 10^-9 ≈ 0.5
elements per nanosecond.
This can still be improved by about a 4x factor, which is the current speed of the Julia code.
It's called "staged" because the computation is done in "stages", which allows the compiler to optimize the code a lot more - namely, the use of const generics in Rust provide more opportunities for profitable loop unrolling and proper SIMD lane-width usage.
You are expected to have FMA and AVX2 compatible hardware (at least). Compile with RUSTFLAGS="-C target-cpu=native" cargo run --release
for best performance.
Decent efforts have been made to ensure
cargo-remark
const
genericscoeffs/_f32.rs
appropriately and declare them as const
.coeffs
obtained previously.buf
ferno_std
support see(Effective Rust link)