| Crates.io | krcflib |
| lib.rs | krcflib |
| version | 4.0.1 |
| created_at | 2025-08-06 08:57:16.663875+00 |
| updated_at | 2025-09-01 16:08:42.665521+00 |
| description | Modified https://github.com/aws/random-cut-forest-by-aws |
| homepage | |
| repository | https://github.com/Bing-su/krcf |
| max_upload_size | |
| id | 1783566 |
| size | 526,525 |
Original: aws/random-cut-forest-by-aws
A Rust implementation of the Random Cut Forest (RCF) algorithm for anomaly detection, forecasting, and data analysis on streaming data. Random Cut Forests are a form of unsupervised machine learning that can detect anomalous data points within a dataset.
This library provides a flexible and efficient implementation of RCF, suitable for multi-dimensional data streams.
Here is a basic example of how to use krcflib to detect anomalies in a multi-dimensional dataset.
First, add krcflib to your Cargo.toml:
[dependencies]
krcflib = "4.0.0" # Replace with the desired version
rand = "0.9"
rand_chacha = "0.9"
Then, you can use it in your code like this:
(note: package name is krcflib, but you should import it as rcflib)
use rand::{Rng, SeedableRng};
use rand_chacha::ChaCha20Rng;
use rcflib::common::multidimdatawithkey;
use rcflib::rcf::RCFBuilder;
fn main() {
// Define the parameters for the forest
let shingle_size = 8;
let base_dimension = 5;
let data_size = 100000;
let number_of_trees = 30;
let capacity = 256;
let time_decay = 0.1 / capacity as f64;
// Build the Random Cut Forest
let mut forest = RCFBuilder::new(base_dimension, shingle_size)
.tree_capacity(capacity)
.number_of_trees(number_of_trees)
.random_seed(17)
.parallel_enabled(false)
.internal_shingling(true)
.time_decay(time_decay)
.build_default()
.unwrap();
// Generate some sample multi-dimensional data
let mut rng = ChaCha20Rng::seed_from_u64(42);
let mut amplitude = Vec::new();
for _i in 0..base_dimension {
amplitude.push((1.0 + 0.2 * rng.gen::<f32>()) * 60.0);
}
let data_with_key = multidimdatawithkey::MultiDimDataWithKey::multi_cosine(
data_size,
&vec![60; base_dimension],
&litude,
5.0, // noise
0,
base_dimension,
)
.unwrap();
// Process the data through the forest
for i in 0..data_with_key.data.len() {
// Get the anomaly score for the current point
let score = forest.score(&data_with_key.data[i]).unwrap();
// Update the forest with the new point
forest.update(&data_with_key.data[i], 0).unwrap();
}
println!("Successfully processed {} data points.", forest.entries_seen());
println!("Final PointStore size: {}", forest.point_store_size());
println!("Total size of the forest: {} bytes (approx)", forest.size());
}
RCFBuilder: This is the main entry point for creating a RandomCutForest. It allows you to configure the forest's parameters.
base_dimension: The number of features in your data.shingle_size: The number of recent data points to consider together as a single point in the forest. This is useful for detecting anomalies in time-series data. If you are not using time-series data, you can set this to 1.number_of_trees: The number of trees in the forest. More trees can lead to more accurate results but will increase memory usage and processing time.tree_capacity: The maximum number of points stored in each tree.time_decay: A parameter that determines how much weight is given to older points. A non-zero value helps the model adapt to changing data patterns.score(): This method takes a data point and returns an anomaly score. The score is a measure of how much the new point deviates from the patterns learned by the forest.
update(): This method adds a new data point to the forest, updating the trees. This allows the model to learn from the new data and adapt over time.