rotoml

Crates.iorotoml
lib.rsrotoml
version0.1.2
created_at2025-04-11 22:11:39.486832+00
updated_at2026-01-03 09:46:46.198952+00
descriptionA native Rust AutoML pipeline toolkit
homepage
repositoryhttps://github.com/okanyenigun/rotoml
max_upload_size
id1630387
size88,948
Okan Yenigün (okanyenigun)

documentation

README

RotoML

A native Rust AutoML toolkit for machine learning pipelines with powerful data analysis and manipulation capabilities.

Crates.io License: MIT

🚀 Version 0.1.2: Enhanced data operations and comprehensive analysis features.

Features

Data Loading

  • Multi-format support: Load CSV and Parquet files
  • Auto-detection: Automatically detects file format from extension
  • Fast processing: Built on Polars for high-performance data operations

Data Operations

  • Column operations: Drop single or multiple columns
  • Row operations: Drop rows by index
  • Duplicate detection: Identify duplicate columns and rows
  • Data validation: Comprehensive error handling and validation

Data Analysis & Reporting

  • Automated analysis: Generate comprehensive data reports in Markdown
  • Quality metrics: Missing values, data completeness, type analysis
  • Duplicate analysis: Detect and report duplicate columns and rows
  • Statistical insights: Numeric and categorical column counts

Installation

cargo install rotoml

Or add to your Cargo.toml:

[dependencies]
rotoml = "0.1.2"

Usage

Command Line

# Analyze CSV file
rotoml --file data.csv

# Analyze Parquet file
rotoml --file data.parquet

As a Library

use rotoml::data_loader::DataLoader;
use rotoml::data_operations::DataOperations;
use rotoml::data_reporter::DataReporter;

// Load data
let df = DataLoader::load("data.csv")?;

// Detect duplicates
let (dup_count, dup_indexes) = DataOperations::count_duplicate_rows(&df)?;
let duplicate_columns = DataOperations::detect_duplicate_columns(&df)?;

// Drop columns
let df = DataOperations::drop_columns(df, &["col1", "col2"])?;

// Drop rows
let df = DataOperations::drop_rows(df, &[0, 5, 10])?;

// Generate report
DataReporter::generate_data_report(&df, "data.csv", "report.md")?;

API Documentation

DataLoader

  • load(file_path) - Auto-detect and load CSV or Parquet
  • load_csv(file_path) - Load CSV file
  • load_parquet(file_path) - Load Parquet file

DataOperations

  • drop_column(df, column_name) - Drop a single column
  • drop_columns(df, column_names) - Drop multiple columns
  • drop_rows(df, indexes) - Drop rows by index
  • detect_duplicate_columns(df) - Find duplicate columns
  • count_duplicate_rows(df) - Count and list duplicate rows

DataReporter

  • generate_data_report(df, file_name, output_path) - Generate comprehensive analysis report

Output Example

The generated report includes:

  • DataFrame shape and column types
  • Missing values analysis with percentages
  • Data quality metrics
  • Duplicate columns detection
  • Duplicate rows analysis with indexes

Future Vision

RotoML is evolving into a complete automated machine learning pipeline:

  • Feature engineering and selection
  • Model selection and hyperparameter tuning
  • Automated training and evaluation
  • Pipeline orchestration

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Author

Okan Yenigün (okanyenigun@gmail.com)

Commit count: 3

cargo fmt