fastkmeans-rs

Crates.io	fastkmeans-rs
lib.rs	fastkmeans-rs
version	0.1.5
created_at	2026-01-06 16:32:10.874517+00
updated_at	2026-01-22 15:40:29.990943+00
description	A fast and efficient k-means clustering implementation in Rust, compatible with ndarray
homepage	https://github.com/lightonai/fastkmeans-rs
repository	https://github.com/lightonai/fastkmeans-rs
max_upload_size
id	2026243
size	153,558

Raphael Sourty (raphaelsty)

documentation

https://docs.rs/fastkmeans-rs

README

fastkmeans-rs

A Rust rewrite of FastKMeans for CPU-based clustering.

This crate is a straightforward Rust port of the excellent fastkmeans Python library by Answer.AI. It provides the same double-chunking k-means algorithm optimized for large-scale clustering without running out of memory, now with Rust performance and multi-threaded parallelization via rayon.

Features

Double-chunking algorithm — Processes both data points and centroids in configurable chunks to avoid OOM issues on large datasets
Multi-threaded — Leverages rayon for parallel computation across all CPU cores
Optional BLAS acceleration — Enable accelerate (macOS) or openblas features for optimized matrix operations
ndarray compatible — Seamlessly integrates with the Rust scientific computing ecosystem
Familiar API — Provides both FAISS-style (train/predict) and scikit-learn-style (fit/fit_predict) interfaces
Memory efficient — Constant memory usage regardless of dataset size through chunked processing
Reproducible — Seeded random number generation for deterministic results

Installation

Add fastkmeans-rs to your Cargo.toml:

[dependencies]
fastkmeans-rs = "0.1.3"

Or via cargo:

cargo add fastkmeans-rs

With BLAS Acceleration (Recommended)

For optimal performance, enable a BLAS backend:

# macOS (recommended - uses Apple Accelerate framework)
fastkmeans-rs = { version = "0.1", features = ["accelerate"] }

# Linux/Windows (requires OpenBLAS installed)
fastkmeans-rs = { version = "0.1", features = ["openblas"] }

Benchmarks

Accuracy Validation

fastkmeans-rs produces clustering results equivalent to the reference Python fastkmeans implementation. We validate this by comparing inertia (sum of squared distances to assigned centroids):

Samples	Features	k	Python Inertia	Rust Inertia	Difference
1,000	64	10	59,630.96	59,816.55	0.31%
500	32	5	14,693.69	14,588.72	0.71%
2,000	128	20	242,046.80	242,268.88	0.09%

The small differences (<1%) are expected since k-means converges to local minima based on random initialization.

Run the accuracy benchmark yourself:

uv run benches/compare_reference.py

Performance Comparison

Benchmark comparing fastkmeans-rs (with BLAS) against Python fastkmeans (using PyTorch) on an Apple M3 Max:

Config	Samples	Dims	k	Python	Rust	Speedup
Small	1,000	64	10	0.006s	0.003s	2.16x
Medium	5,000	64	50	0.016s	0.015s	1.06x
Large	10,000	128	100	0.044s	0.041s	1.07x
XL	25,000	128	100	0.098s	0.083s	1.18x
XXL	50,000	128	256	0.315s	0.293s	1.08x

Average speedup: 1.31x (with BLAS enabled)

Run the performance benchmark yourself:

uv run benches/benchmark_comparison.py

Note: Without BLAS, Rust performance is slower than Python on large datasets. Enable accelerate or openblas features for optimal performance.

Quick Start

use fastkmeans_rs::FastKMeans;
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;

fn main() {
    // Generate 10,000 random points in 128 dimensions
    let data = Array2::random((10_000, 128), Uniform::new(-1.0f32, 1.0));

    // Create k-means with 100 clusters
    let mut kmeans = FastKMeans::new(128, 100);

    // Train the model
    kmeans.train(&data.view()).unwrap();

    // Get cluster assignments
    let labels = kmeans.predict(&data.view()).unwrap();

    println!("Assigned {} points to clusters", labels.len());
}

Usage

Basic Usage

use fastkmeans_rs::FastKMeans;
use ndarray::Array2;

// Your data as an ndarray (n_samples x n_features)
let data: Array2<f32> = /* your data */;

// Create a FastKMeans instance
// Arguments: dimensions (d), number of clusters (k)
let mut kmeans = FastKMeans::new(128, 50);

// Train the model
kmeans.train(&data.view())?;

// Predict cluster assignments for new data
let labels = kmeans.predict(&data.view())?;

// Access the learned centroids
if let Some(centroids) = kmeans.centroids() {
    println!("Centroid shape: {:?}", centroids.shape());
}

Fit-Predict Pattern

For convenience, you can fit and predict in a single call:

use fastkmeans_rs::FastKMeans;

let mut kmeans = FastKMeans::new(128, 50);

// Fit and predict in one step
let labels = kmeans.fit_predict(&data.view())?;

Custom Configuration

Fine-tune the algorithm with KMeansConfig:

use fastkmeans_rs::{FastKMeans, KMeansConfig};

let config = KMeansConfig {
    k: 100,                          // Number of clusters
    max_iters: 50,                   // Maximum iterations (default: 25)
    tol: 1e-6,                       // Convergence tolerance (default: 1e-8)
    seed: 42,                        // Random seed for reproducibility
    max_points_per_centroid: Some(256), // Subsampling threshold (None to disable)
    chunk_size_data: 51_200,         // Data chunk size for memory control
    chunk_size_centroids: 10_240,    // Centroid chunk size
    verbose: true,                   // Print progress information
};

let mut kmeans = FastKMeans::with_config(config);
kmeans.fit(&data.view())?;

Configuration Parameters

Parameter	Default	Description
`k`	—	Number of clusters (required)
`max_iters`	25	Maximum number of iterations
`tol`	1e-8	Convergence tolerance based on centroid shift
`seed`	0	Random seed for reproducible results
`max_points_per_centroid`	Some(256)	If set, subsamples data when `n > k * value`
`chunk_size_data`	51,200	Number of data points processed per chunk
`chunk_size_centroids`	10,240	Number of centroids processed per chunk
`verbose`	false	Print iteration progress

Algorithm

This implementation uses the double-chunking k-means algorithm from the original FastKMeans:

Efficient distance computation using the identity:
```
||x - c||² = ||x||² + ||c||² - 2·x·c
```
This avoids materializing the full distance matrix by pre-computing norms and using matrix multiplication.
Double-chunking to control memory:
- Outer loop: process data in chunks of chunk_size_data
- Inner loop: process centroids in chunks of chunk_size_centroids
- Memory usage is O(chunk_size_data × chunk_size_centroids) regardless of total dataset size
Parallel processing via rayon for:
- Norm computation
- Distance calculations
- Cluster assignment updates
- Centroid recomputation

Performance Tips

Adjust chunk sizes based on your available memory. Larger chunks = faster but more memory
Use subsampling (max_points_per_centroid) for very large datasets during initial exploration
Set verbose: true to monitor convergence and iteration times
Compile with --release for optimal performance (10-100x faster than debug builds)

Example: Large-Scale Clustering

use fastkmeans_rs::{FastKMeans, KMeansConfig};
use ndarray::Array2;
use ndarray_rand::RandomExt;
use ndarray_rand::rand_distr::Uniform;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Simulate 1 million points in 256 dimensions
    let n_samples = 1_000_000;
    let n_features = 256;
    let n_clusters = 1000;

    println!("Generating {} points...", n_samples);
    let data = Array2::random((n_samples, n_features), Uniform::new(-1.0f32, 1.0));

    let config = KMeansConfig {
        k: n_clusters,
        max_iters: 25,
        tol: 1e-6,
        seed: 42,
        max_points_per_centroid: Some(256),  // Subsample to 256k points
        chunk_size_data: 50_000,
        chunk_size_centroids: 500,
        verbose: true,
    };

    let mut kmeans = FastKMeans::with_config(config);

    println!("Training k-means with {} clusters...", n_clusters);
    kmeans.train(&data.view())?;

    println!("Predicting labels...");
    let labels = kmeans.predict(&data.view())?;

    println!("Done! Assigned {} points to {} clusters", labels.len(), n_clusters);

    Ok(())
}

Contributing

Setup

Clone the repository:

git clone https://github.com/lightonai/fastkmeans-rs.git
cd fastkmeans-rs

Install the git hooks:

make install-hooks

Development Commands

Command	Description
`make build`	Build the project in debug mode
`make release`	Build in release mode
`make test`	Run all tests
`make lint`	Run clippy and format checks
`make fmt`	Format code
`make doc`	Build documentation
`make bench`	Run benchmarks
`make example`	Run the basic example
`make ci`	Run all CI checks locally
`make build-blas`	Build with BLAS acceleration (auto-detects platform)
`make test-blas`	Run tests with BLAS acceleration
`make compare-reference`	Compare output with Python fastkmeans (requires uv)
`make benchmark-comparison`	Run performance comparison vs Python fastkmeans

Running CI Locally

Before submitting a PR, run all checks locally:

make ci

This runs formatting checks, clippy, tests, documentation build, and benchmark compilation.

Acknowledgements

This crate is a Rust port of FastKMeans by Answer.AI. All credit for the algorithm design and optimization strategies goes to the original authors.

License

Apache-2.0 License - see LICENSE for details.

Commit count: 23

fastkmeans-rs

documentation

README

fastkmeans-rs

Features

Installation

With BLAS Acceleration (Recommended)

Benchmarks

Accuracy Validation

Performance Comparison

Quick Start

Usage

Basic Usage

Fit-Predict Pattern

Custom Configuration

Configuration Parameters

Algorithm

Performance Tips

Example: Large-Scale Clustering

Contributing

Setup

Development Commands

Running CI Locally

Acknowledgements

License

cargo fmt