kmeans_uni

Crates.iokmeans_uni
lib.rskmeans_uni
version0.1.0
created_at2025-12-08 08:28:06.010678+00
updated_at2025-12-08 08:28:06.010678+00
descriptionFast, safe K-Means++ with SIMD acceleration, mini-batch training and WASM support.
homepagehttps://github.com/Deniskore
repositoryhttps://github.com/Deniskore/kmeans_uni
max_upload_size
id1972953
size195,154
Denis (Deniskore)

documentation

README

Build Status Crates.io API reference License

kmeans_uni

Fast, safe K-Means++ for CPU-only workloads with optional SIMD acceleration. Supports Euclidean distance and dot-product scoring, provides both classic Lloyd iterations and a mini-batch variant, and includes parity tests against linfa-clustering to guard correctness. Benchmarks show significantly faster training and prediction than linfa on the same CPU. The crate builds on stable Rust.

Key Features

  • 100% safe Rust (#![forbid(unsafe_code)]) with a small dependency set.
  • Optimized for speed, beats linfa-clustering in AArch64/x86_64 benches for training and prediction.
  • Optional SIMD acceleration (wide feature) and WebAssembly support (see WASM.md).
  • Ergonomic builder API

Quickstart

use kmeans_uni::KMeansBuilder;

const N_COLS: usize = 2;
let data: Vec<f32> = vec![
    1.0, 1.0,
    1.2, 0.9,
    -1.0, -1.1,
    -1.2, -0.8,
];

match KMeansBuilder::new(2)
    .iterations(100)
    .cpu_simd() // requires default "wide" feature
    .euclidean()
    .build()
    .fit(&data, N_COLS)
{
    Ok(model) => match model.predict(&data) {
        Ok(labels) => println!("labels: {labels:?}"),
        Err(err) => eprintln!("prediction failed: {err}"),
    },
    Err(err) => eprintln!("training failed: {err}"),
}

Mini-batch training for large datasets:

use kmeans_uni::KMeansBuilder;

match KMeansBuilder::new(8)
    .iterations(50) // iterations = number of batches
    .cpu_scalar()
    .euclidean()
    .mini_batch_rel_tolerance(0.0)
    .mini_batch_patience(0)
    .build()
    .fit_mini_batch_from_source(
        &kmeans_uni::SlicePointSource::new(&data, N_COLS).unwrap(),
        256, // batch size
    )
{
    Ok(model) => println!("centroids: {:?}", model.centroids),
    Err(err) => eprintln!("mini-batch training failed: {err}"),
}

Crate features

  • wide (default): enables SIMD CPU backend for K-Means (f32 and f64) via the wide crate.
  • As std::simd stabilizes, you can expect to squeeze more performance from portable SIMD without depending on wide.
  • serde: derive Serialize/Deserialize on public types.
  • wasm: build with a wasm-friendly configuration (sequential execution, no Rayon). See WASM.md for a browser demo and build steps.
  • Build without defaults (--no-default-features) to force scalar-only code paths.

License

Licensed under either of:

  • MIT license
  • Apache License, Version 2.0

at your option.

Commit count: 0

cargo fmt