| Crates.io | kategorize |
| lib.rs | kategorize |
| version | 0.3.0 |
| created_at | 2025-08-27 17:20:53.551845+00 |
| updated_at | 2025-09-16 02:27:54.225705+00 |
| description | K-modes and K-prototypes clustering algorithms for categorical and mixed data |
| homepage | |
| repository | https://github.com/ethqnol/kategorize |
| max_upload_size | |
| id | 1812994 |
| size | 174,333 |
A fast, memory-efficient Rust implementation of k-modes and k-prototypes clustering algorithms for categorical and mixed data.
Add this to your Cargo.toml:
[dependencies]
kategorize = "0.3"
use kategorize::{KModes, InitMethod, DistanceMetric};
use ndarray::Array2;
// Create categorical data
let data = Array2::from_shape_vec((6, 2), vec![
"A", "X", "A", "X", "B", "Y",
"B", "Y", "C", "Z", "C", "Z"
]).unwrap();
// Configure and run k-modes clustering
let kmodes = KModes::new(3)
.init_method(InitMethod::Huang)
.distance_metric(DistanceMetric::Jaccard) // Choose distance metric
.use_incremental_updates(true) // Enable performance optimization
.max_iter(100)
.n_init(10)
.random_state(42);
let result = kmodes.fit(data.view()).unwrap();
println!("Cluster assignments: {:?}", result.labels);
println!("Centroids: {:?}", result.centroids);
println!("Converged: {}", result.converged);
use kategorize::{KPrototypes, MixedValue};
use ndarray::Array2;
// Create mixed categorical and numerical data
let data = Array2::from_shape_vec((4, 3), vec![
MixedValue::Categorical("A"), MixedValue::Categorical("X"), MixedValue::Numerical(1.0),
MixedValue::Categorical("A"), MixedValue::Categorical("X"), MixedValue::Numerical(2.0),
MixedValue::Categorical("B"), MixedValue::Categorical("Y"), MixedValue::Numerical(10.0),
MixedValue::Categorical("B"), MixedValue::Categorical("Y"), MixedValue::Numerical(11.0),
]).unwrap();
let kprototypes = KPrototypes::new(2, vec![0, 1], vec![2]) // categorical: [0,1], numerical: [2]
.gamma(1.0) // weight for numerical vs categorical features
.random_state(42);
let result = kprototypes.fit(data.view(), vec![0, 1], vec![2]).unwrap();
println!("Mixed data clustering result: {:?}", result.labels);
K-modes extends k-means clustering to categorical data by:
K-prototypes combines k-modes and k-means to handle mixed data by:
Kategorize is designed for performance with:
By default, Kategorize uses incremental mode updates for significant performance improvements. This feature caches frequency counts for each cluster and updates them incrementally as points change assignments, avoiding expensive mode recomputation.
let kmodes = KModes::new(3)
.use_incremental_updates(true) // Enabled by default
.random_state(42);
// For comparison, disable to use the classic algorithm:
let classic_kmodes = KModes::new(3)
.use_incremental_updates(false)
.random_state(42);
Performance benefits vary by dataset size:
Check out the examples directory for comprehensive usage patterns:
basic_kmodes.rs - Basic k-modes clusteringkprototypes_mixed_data.rs - Mixed data clusteringadvanced_usage.rs - Parameter tuning and optimizationjaccard_distance.rs - Jaccard distance metric usageRun examples with:
cargo run --example basic_kmodes
cargo run --example kprototypes_mixed_data
cargo run --example advanced_usage
cargo run --example jaccard_distance
Run benchmarks to see performance characteristics:
cargo bench
This will run comprehensive benchmarks testing:
For detailed API documentation, visit docs.rs/kategorize or run:
cargo doc --open
Kategorize provides similar functionality to the popular Python kmodes library but with:
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Areas where contributions would be particularly valuable:
This project is licensed under the MIT License.
If you use Kategorize in your research, please cite:
@software{kategorize,
title = {Kategorize: K-modes and K-prototypes clustering for Rust},
author = {Wu, Ethan},
year = {2024},
url = {https://github.com/ethqnol/kategorize}
}