| Crates.io | scirs2-datasets |
| lib.rs | scirs2-datasets |
| version | 0.1.0-beta.2 |
| created_at | 2025-04-12 19:47:14.542206+00 |
| updated_at | 2025-09-20 08:57:10.128652+00 |
| description | Datasets module for SciRS2 (scirs2-datasets) |
| homepage | |
| repository | https://github.com/cool-japan/scirs |
| max_upload_size | |
| id | 1631187 |
| size | 1,399,062 |
A production-ready collection of dataset utilities for the SciRS2 scientific computing library. This module provides comprehensive functionality for loading, generating, and working with datasets commonly used in scientific computing, machine learning, and statistical analysis.
This is the first beta release with all core functionality implemented, thoroughly tested (117+ tests), and production-ready. The API is stable and follows Rust best practices with zero-warning builds.
Add to your Cargo.toml:
[dependencies]
scirs2-datasets = "0.1.0-beta.2"
For remote dataset downloading capabilities:
[dependencies]
scirs2-datasets = { version = "0.1.0-beta.2", features = ["download"] }
use scirs2_datasets::{load_iris, load_boston, Dataset};
// Load the Iris dataset
let iris = load_iris()?;
println!("Iris: {} samples, {} features", iris.n_samples(), iris.n_features());
// Load Boston housing dataset
let boston = load_boston()?;
println!("Boston: {} samples, {} features", boston.n_samples(), boston.n_features());
use scirs2_datasets::{make_classification, make_regression, make_blobs, make_spirals};
// Classification dataset
let dataset = make_classification(1000, 10, 3, 2, 4, Some(42))?;
println!("Classification: {} samples, {} features", dataset.n_samples(), dataset.n_features());
// Non-linear patterns
let spirals = make_spirals(500, 2, 0.1, Some(42))?;
let blobs = make_blobs(300, 2, 4, 1.0, Some(42))?;
use scirs2_datasets::{load_iris, k_fold_split, stratified_k_fold_split, train_test_split};
let iris = load_iris()?;
// K-fold cross-validation
let folds = k_fold_split(iris.n_samples(), 5, true, Some(42))?;
// Stratified splitting with targets
if let Some(target) = &iris.target {
let stratified_folds = stratified_k_fold_split(target, 5, true, Some(42))?;
let (train_idx, test_idx) = train_test_split(iris.n_samples(), 0.8, Some(42))?;
}
Pre-loaded classic datasets for immediate use:
use scirs2_datasets::{load_iris, load_digits, load_wine, load_breast_cancer, load_diabetes, load_boston};
// All datasets return a Dataset<f64> with consistent API
let iris = load_iris()?; // 150 samples, 4 features, 3 classes
let digits = load_digits()?; // 1797 samples, 64 features, 10 classes
let wine = load_wine()?; // 178 samples, 13 features, 3 classes
let cancer = load_breast_cancer()?; // 569 samples, 30 features, 2 classes
let diabetes = load_diabetes()?; // 442 samples, 10 features, regression
let boston = load_boston()?; // 506 samples, 13 features, regression
Comprehensive synthetic dataset creation:
use scirs2_datasets::{
make_classification, make_regression, make_blobs, make_circles,
make_moons, make_spirals, make_swiss_roll, make_time_series
};
// Linear and non-linear patterns
let classification = make_classification(500, 8, 2, 1, 2, Some(42))?;
let regression = make_regression(400, 5, 3, 0.1, Some(42))?;
let circles = make_circles(300, 0.1, Some(42))?;
let moons = make_moons(200, 0.05, Some(42))?;
// Complex patterns
let spirals = make_spirals(600, 3, 0.2, Some(42))?;
let swiss_roll = make_swiss_roll(800, 0.1, Some(42))?;
// Time series
let ts = make_time_series(1000, 24, 0.1, Some(42))?;
Complete toolkit for dataset manipulation:
use scirs2_datasets::{
// Cross-validation
k_fold_split, stratified_k_fold_split, time_series_split,
// Sampling
random_sample, stratified_sample, bootstrap_sample, importance_sample,
// Balancing
create_balanced_dataset, random_oversample, random_undersample,
// Feature engineering
polynomial_features, create_binned_features, statistical_features,
// Scaling
min_max_scale, robust_scale, normalize
};
Efficient dataset management with automatic caching:
use scirs2_datasets::{CacheManager, DatasetCache};
let cache = CacheManager::new()?;
let stats = cache.get_statistics()?;
println!("Cache contains {} datasets using {} MB",
stats.total_files, stats.total_size_mb);
The API is stable and production-ready. All public functions are thoroughly documented with examples. Breaking changes will only occur in major version updates (1.0.0+).
Seamlessly integrates with other SciRS2 modules:
use scirs2_datasets::{load_iris, make_classification};
// Use with scirs2-stats, scirs2-linalg, etc.
See the project CONTRIBUTING.md for guidelines. Focus areas for contributions:
Dual-licensed under MIT or Apache License 2.0.