sklears-datasets

Crates.iosklears-datasets
lib.rssklears-datasets
version0.1.0-beta.1
created_at2025-10-13 12:08:29.775738+00
updated_at2026-01-01 21:29:02.057055+00
descriptionDataset utilities and generation for sklears
homepagehttps://github.com/cool-japan/sklears
repositoryhttps://github.com/cool-japan/sklears
max_upload_size
id1880447
size1,086,280
KitaSan (cool-japan)

documentation

README

sklears-datasets

Crates.io Documentation License Minimum Rust Version

Latest release: 0.1.0-beta.1 (January 1, 2026). See the workspace release notes for highlights and upgrade guidance.

Overview

sklears-datasets centralizes dataset loaders, synthetic generators, and data utilities used throughout the sklears ecosystem. It mirrors scikit-learn’s dataset module while adding Rust-first performance and IO enhancements.

Key Features

  • Classic Loaders: Diabetes, Iris, Digits, Wine, Breast Cancer, 20 Newsgroups, and more.
  • Synthetic Generators: make_blobs, make_moons, make_circles, Gaussian quantiles, regression surfaces, and streaming generators.
  • File IO: CSV, Parquet, Arrow IPC, and memory-mapped dataset support with Polars integration.
  • Benchmark Utilities: Deterministic dataset splits and sampling strategies for reproducible experiments.

Quick Start

use sklears_datasets::{load_iris, make_blobs};

// Built-in dataset
let iris = load_iris()?;
println!("{} samples, {} features", iris.data.nrows(), iris.data.ncols());

// Synthetic data
let blobs = make_blobs(1000)
    .n_features(10)
    .centers(4)
    .cluster_std(2.5)
    .random_state(Some(42))
    .build()?;

Status

  • All loaders/generators validated through the 11,292 passing workspace tests for 0.1.0-beta.1.
  • Supports lazy loading and streaming for large-scale workflows.
  • Future work (federated dataset shards, synthetic time series) tracked in this crate’s TODO.md.
Commit count: 0

cargo fmt