| Crates.io | avx-clustering |
| lib.rs | avx-clustering |
| version | 0.1.1 |
| created_at | 2025-12-17 03:08:59.231654+00 |
| updated_at | 2025-12-17 03:08:59.231654+00 |
| description | State-of-the-art clustering algorithms for Rust - surpassing scikit-learn, HDBSCAN, and RAPIDS cuML |
| homepage | https://avila.inc |
| repository | https://github.com/avilaops/arxis |
| max_upload_size | |
| id | 1989236 |
| size | 322,798 |
State-of-the-art clustering algorithms for Rust - surpassing scikit-learn, HDBSCAN, and RAPIDS cuML
Pure Rust implementations of advanced clustering algorithms with GPU acceleration, parallel processing, and scientific features.
[dependencies]
avx-clustering = "0.1"
[dependencies]
avx-clustering = { version = "0.1", features = ["gpu"] }
Available features:
gpu - CUDA GPU accelerationgpu-wgpu - WGPU cross-platform GPU supportfull - All features enableduse avx_clustering::prelude::*;
use ndarray::array;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create sample data
let data = array![
[1.0, 2.0],
[1.5, 1.8],
[5.0, 8.0],
[8.0, 8.0],
[1.0, 0.6],
[9.0, 11.0],
];
// Fit K-Means with 2 clusters
let kmeans = KMeansBuilder::new(2)
.max_iter(100)
.tolerance(1e-4)
.fit(data.view())?;
println!("Labels: {:?}", kmeans.labels);
println!("Centroids:\n{}", kmeans.centroids);
// Predict new points
let new_data = array![[0.0, 0.0], [10.0, 10.0]];
let predictions = kmeans.predict(new_data.view())?;
println!("Predictions: {:?}", predictions);
Ok(())
}
use avx_clustering::prelude::*;
let data = array![
[1.0, 2.0],
[2.0, 2.0],
[2.0, 3.0],
[8.0, 7.0],
[8.0, 8.0],
[25.0, 80.0], // Noise point
];
let dbscan = DBSCANBuilder::new()
.eps(3.0)
.min_samples(2)
.fit(data.view())?;
println!("Labels: {:?}", dbscan.labels); // -1 indicates noise
println!("Core samples: {:?}", dbscan.core_sample_indices);
use avx_clustering::prelude::*;
let data = generate_blobs(1000, 5, 2.0)?;
let hdbscan = HDBSCANBuilder::new()
.min_cluster_size(50)
.min_samples(5)
.fit(data.view())?;
println!("Number of clusters: {}", hdbscan.n_clusters());
println!("Outlier scores: {:?}", &hdbscan.outlier_scores[..10]);
use avx_clustering::prelude::*;
let data = generate_moons(300, 0.1)?; // Two interleaving half circles
let spectral = SpectralClusteringBuilder::new(2)
.n_neighbors(10)
.fit(data.view())?;
println!("Labels: {:?}", spectral.labels);
use avx_clustering::prelude::*;
let data = array![
[0.0, 0.0],
[0.1, 0.1],
[5.0, 5.0],
[5.1, 5.1],
];
let ap = AffinityPropagationBuilder::new()
.damping(0.5)
.max_iter(200)
.fit(data.view())?;
println!("Exemplars: {}", ap.cluster_centers);
println!("Number of clusters: {}", ap.n_clusters);
use avx_clustering::prelude::*;
let data = generate_blobs(500, 3, 1.0)?;
let ensemble = EnsembleClusteringBuilder::new(3)
.n_iterations(20)
.subsample_ratio(0.8)
.fit(data.view())?;
println!("Stability score: {:.3}", ensemble.stability_score());
println!("Labels: {:?}", &ensemble.labels[..10]);
use avx_clustering::prelude::*;
// Create time series data (n_series x n_timepoints)
let ts_data = array![
[1.0, 2.0, 3.0, 4.0, 5.0],
[1.1, 2.1, 3.1, 4.1, 5.1],
[10.0, 9.0, 8.0, 7.0, 6.0],
];
let ts_kmeans = TimeSeriesKMeansBuilder::new(2)
.distance_metric(TimeSeriesDistance::DTW)
.fit(ts_data.view())?;
println!("Time series clusters: {:?}", ts_kmeans.labels);
use avx_clustering::prelude::*;
let documents = vec![
"machine learning algorithms",
"deep neural networks",
"clustering data points",
"supervised learning models",
];
let text_cluster = TextClusteringBuilder::new(2)
.max_features(100)
.fit(&documents)?;
println!("Document clusters: {:?}", text_cluster.labels);
Hardware: AMD Ryzen 9 5950X, RTX 3090
| Algorithm | Dataset Size | CPU Time | GPU Time | Speedup |
|---|---|---|---|---|
| K-Means | 1M points | 1.2s | 0.08s | 15x |
| DBSCAN | 100K points | 2.5s | 0.18s | 13.9x |
| HDBSCAN | 100K points | 4.8s | 0.35s | 13.7x |
| Spectral | 10K points | 3.2s | 0.25s | 12.8x |
Comparison with Other Libraries (100K points, K-Means):
| Library | Language | Time | Memory |
|---|---|---|---|
| avx | Rust | 1.2s | 78 MB |
| scikit-learn | Python | 3.8s | 420 MB |
| RAPIDS cuML | Python+CUDA | 1.5s | 650 MB |
| Julia Clustering | Julia | 2.1s | 180 MB |
use avx_clustering::scientific::astronomy::*;
// Load astronomical data (RA, Dec, redshift)
let galaxies = load_sdss_data("galaxies.csv")?;
let galaxy_clusters = GalaxyClusteringBuilder::new()
.min_members(10)
.max_radius_mpc(2.0)
.fit(galaxies.view())?;
println!("Found {} galaxy clusters", galaxy_clusters.n_clusters());
use avx_clustering::scientific::physics::*;
// Particle collision data (px, py, pz, energy)
let particles = simulate_collision()?;
let jets = ParticleClusteringBuilder::new()
.algorithm(JetAlgorithm::AntiKt)
.radius_parameter(0.4)
.fit(particles.view())?;
println!("Reconstructed {} jets", jets.n_clusters());
use avx_clustering::prelude::*;
let mut incremental = IncrementalKMeans::new(3);
// Process data in batches
for batch in data_stream.chunks(100) {
incremental.partial_fit(batch.view())?;
}
println!("Final centroids:\n{}", incremental.centroids);
use avx_clustering::gpu::*;
#[cfg(feature = "gpu")]
{
let data = generate_large_dataset(10_000_000)?;
let kmeans_gpu = KMeansGPU::new(10)
.fit(data.view())?;
println!("GPU clustering complete: {} clusters", kmeans_gpu.n_clusters);
}
use avx_clustering::prelude::*;
let data = generate_complex_data()?;
// Automatically find best number of clusters
let optimal = auto_tune_kmeans(data.view(), 2..=10)?;
println!("Optimal k: {}", optimal.k);
println!("Silhouette score: {:.3}", optimal.score);
use avx_clustering::metrics::*;
fn custom_distance(a: &[f64], b: &[f64]) -> f64 {
// Your custom distance function
a.iter().zip(b.iter())
.map(|(x, y)| (x - y).abs())
.sum()
}
let dbscan = DBSCANBuilder::new()
.eps(3.0)
.min_samples(5)
.distance_fn(custom_distance)
.fit(data.view())?;
# Run all tests
cargo test
# Run with all features
cargo test --all-features
# Run benchmarks
cargo bench
# Run specific algorithm tests
cargo test --test kmeans
cargo test --test dbscan
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench --bench kmeans_bench
# With GPU
cargo bench --features gpu --bench gpu_benchmarks
avx-clustering/
βββ algorithms/ # Core clustering algorithms
β βββ kmeans.rs
β βββ dbscan.rs
β βββ hdbscan.rs
β βββ optics.rs
β βββ affinity_propagation.rs
β βββ mean_shift.rs
β βββ spectral.rs
β βββ agglomerative.rs
β βββ ensemble.rs
β βββ text.rs
β βββ timeseries.rs
βββ gpu/ # GPU implementations
β βββ kmeans_gpu.rs
β βββ dbscan_gpu.rs
βββ metrics/ # Distance metrics & evaluation
β βββ distances.rs
β βββ silhouette.rs
β βββ davies_bouldin.rs
βββ scientific/ # Domain-specific clustering
βββ astronomy.rs # Galaxy clustering
βββ physics.rs # Particle clustering
βββ spacetime.rs # 4D tensor clustering
let customer_features = extract_features(&customers)?;
let segments = KMeansBuilder::new(5).fit(customer_features.view())?;
let dbscan = DBSCANBuilder::new().eps(0.3).min_samples(5).fit(data.view())?;
let anomalies: Vec<_> = dbscan.labels.iter()
.enumerate()
.filter(|(_, &label)| label == -1)
.map(|(i, _)| i)
.collect();
let pixels = image_to_array(&img)?;
let segments = MeanShiftBuilder::new().bandwidth(2.0).fit(pixels.view())?;
let docs = load_documents("corpus.txt")?;
let clusters = TextClusteringBuilder::new(10)
.max_features(1000)
.fit(&docs)?;
examples/benches/| Feature | avx | scikit-learn | HDBSCAN.py | RAPIDS cuML |
|---|---|---|---|---|
| Pure Rust | β | β | β | β |
| GPU Support | β | β | β | β |
| HDBSCAN | β | β | β | β |
| Time Series | β | β οΈ | β | β |
| Scientific | β | β | β | β |
| Memory | Low | High | Medium | High |
| Speed (CPU) | Fast | Slow | Fast | Slow |
| Speed (GPU) | Fastest | N/A | N/A | Fast |
Licensed under either of:
at your option.
Contributions welcome! Please see CONTRIBUTING.md.
Built with β€οΈ in Brazil by avx Team