kmeans

Crates.iokmeans
lib.rskmeans
version
sourcesrc
created_at2019-07-27 15:57:05.090304
updated_at2024-11-17 03:52:22.281279
descriptionSmall and fast library for k-means clustering calculations.
homepage
repositoryhttps://github.com/seijikun/kmean-rs
max_upload_size
id152102
Cargo.toml error:TOML parse error at line 19, column 1 | 19 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include`
size0
Markus Ebner (seijikun)

documentation

README

kmeans

Current Crates.io Version docs

kmeans is a small and fast library for k-means clustering calculations. It requires a nightly compiler with the portable_simd feature to work.

Here is a small example, using kmean++ as initialization method and lloyd as k-means variant:

use kmeans::*;

fn main() {
    let (sample_cnt, sample_dims, k, max_iter) = (20000, 200, 4, 100);

    // Generate some random data
    let mut samples = vec![0.0f64;sample_cnt * sample_dims];
    samples.iter_mut().for_each(|v| *v = rand::random());

    // Calculate kmeans, using kmean++ as initialization-method
    // KMeans<_, 8> specifies to use f64 SIMD vectors with 8 lanes (e.g. AVX512)
    let kmean: KMeans<f64, 8, _> = KMeans::new(samples, sample_cnt, sample_dims, EuclideanDistance);
    let result = kmean.kmeans_lloyd(k, max_iter, KMeans::init_kmeanplusplus, &KMeansConfig::default());

    println!("Centroids: {:?}", result.centroids);
    println!("Cluster-Assignments: {:?}", result.assignments);
    println!("Error: {}", result.distsum);
}

Datastructures

For performance-reasons, all calculations are done on bare vectors, using hand-written SIMD intrinsics from the packed_simd crate. All vectors are stored row-major, so each sample is stored in a consecutive block of memory.

Supported variants / algorithms

  • lloyd (standard kmeans)
  • minibatch

Supported centroid initialization methods

  • KMean++
  • random partition
  • random sample

Supported distance functions

  • Euclidean distance
  • Histogram distance
Commit count: 51

cargo fmt