Crates.io | pandrs |
lib.rs | pandrs |
version | 0.1.0-beta.2 |
created_at | 2025-04-18 00:03:10.059965+00 |
updated_at | 2025-09-21 10:20:48.528727+00 |
description | A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities |
homepage | |
repository | https://github.com/cool-japan/pandrs |
max_upload_size | |
id | 1638645 |
size | 6,956,242 |
A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.
🚀 Beta Release (0.1.0-beta.2) - Latest Available: This feature-complete beta release is ready for production use. With 345+ comprehensive tests, optimized performance, and extensive documentation, PandRS delivers a robust pandas-like experience for Rust developers. Published to crates.io September 2025.
PandRS is a comprehensive data manipulation library that brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:
use pandrs::{DataFrame, Series};
use std::collections::HashMap;
// Create a DataFrame
let mut df = DataFrame::new();
df.add_column("name".to_string(),
Series::from_vec(vec!["Alice", "Bob", "Carol"], Some("name")))?;
df.add_column("age".to_string(),
Series::from_vec(vec![30, 25, 35], Some("age")))?;
df.add_column("salary".to_string(),
Series::from_vec(vec![75000.0, 65000.0, 85000.0], Some("salary")))?;
// Perform operations
let filtered = df.filter("age > 25")?;
let mean_salary = df.column("salary")?.mean()?;
let grouped = df.groupby(vec!["department"])?.agg(HashMap::from([
("salary".to_string(), vec!["mean", "sum"]),
("age".to_string(), vec!["max"])
]))?;
i32
, i64
, f32
, f64
, u32
, u64
NA
support across all typesAdd to your Cargo.toml
:
[dependencies]
pandrs = "0.1.0-beta.2"
Enable additional functionality with feature flags:
[dependencies]
pandrs = { version = "0.1.0-beta.2", features = ["stable"] }
Available features:
stable
: Recommended stable feature setoptimized
: Performance optimizations and SIMDbackward_compat
: Backward compatibility supportparquet
: Parquet file supportexcel
: Excel file supportsql
: Database connectivitydistributed
: Distributed computing with DataFusionvisualization
: Plotting capabilitiesstreaming
: Real-time data processingserving
: Model serving and deploymentcuda
: GPU acceleration (requires CUDA toolkit)wasm
: WebAssembly compilation supportjit
: Just-in-time compilationall-safe
: All stable features (recommended)test-safe
: Features safe for testingPerformance comparison with pandas (Python) and Polars (Rust):
Operation | PandRS | Pandas | Polars | Speedup vs Pandas |
---|---|---|---|---|
CSV Read (1M rows) | 0.18s | 0.92s | 0.15s | 5.1x |
GroupBy Sum | 0.09s | 0.31s | 0.08s | 3.4x |
Join Operations | 0.21s | 0.87s | 0.19s | 4.1x |
String Operations | 0.14s | 1.23s | 0.16s | 8.8x |
Rolling Window | 0.11s | 0.43s | 0.12s | 3.9x |
Benchmarks performed on AMD Ryzen 9 5950X, 64GB RAM, NVMe SSD
use pandrs::prelude::*;
let df = DataFrame::read_csv("data.csv", CsvReadOptions::default())?;
// Basic statistics
let stats = df.describe()?;
println!("Data statistics:\n{}", stats);
// Filtering and aggregation
let result = df
.filter("age >= 18 && income > 50000")?
.groupby(vec!["city", "occupation"])?
.agg(HashMap::from([
("income".to_string(), vec!["mean", "median", "std"]),
("age".to_string(), vec!["mean"])
]))?
.sort_values(vec!["income_mean"], vec![false])?;
use pandrs::prelude::*;
use chrono::{Duration, Utc};
let mut df = DataFrame::read_csv("timeseries.csv", CsvReadOptions::default())?;
df.set_index("timestamp")?;
// Resample to daily frequency
let daily = df.resample("D")?.mean()?;
// Calculate rolling statistics
let rolling_stats = daily
.rolling(RollingOptions {
window: 7,
min_periods: Some(1),
center: false,
})?
.agg(HashMap::from([
("value".to_string(), vec!["mean", "std"]),
]))?;
// Exponentially weighted moving average
let ewm = daily.ewm(EwmOptions {
span: Some(10.0),
..Default::default()
})?;
use pandrs::prelude::*;
// Load and preprocess data
let df = DataFrame::read_parquet("features.parquet")?;
// Handle missing values
let df_filled = df.fillna(FillNaOptions::Forward)?;
// Encode categorical variables
let df_encoded = df_filled.get_dummies(vec!["category1", "category2"], None)?;
// Normalize numerical features
let features = vec!["feature1", "feature2", "feature3"];
let df_normalized = df_encoded.apply_columns(&features, |series| {
let mean = series.mean()?;
let std = series.std(1)?;
series.sub_scalar(mean)?.div_scalar(std)
})?;
// Split features and target
let X = df_normalized.drop(vec!["target"])?;
let y = df_normalized.column("target")?;
We welcome contributions! Please see our Contributing Guide for details.
# Clone the repository
git clone https://github.com/cool-japan/pandrs
cd pandrs
# Install development dependencies
cargo install cargo-nextest cargo-criterion
# Run tests
cargo nextest run
# Run benchmarks
cargo criterion
# Check code quality
cargo clippy -- -D warnings
cargo fmt -- --check
Licensed under either of:
at your option.
PandRS is inspired by the excellent pandas library and incorporates ideas from:
PandRS is a Cool Japan project, bringing high-performance data analysis to the Rust ecosystem.