| Crates.io | datacell |
| lib.rs | datacell |
| version | 0.1.3 |
| created_at | 2025-12-08 17:27:20.80947+00 |
| updated_at | 2026-01-22 17:29:16.868042+00 |
| description | A CLI tool for reading, writing, converting XLS and CSV files with formula support |
| homepage | |
| repository | https://github.com/yingkitw/datacell |
| max_upload_size | |
| id | 1974097 |
| size | 637,322 |
A fast, unified CLI tool for spreadsheet and columnar data manipulation.
Working with tabular data often requires juggling multiple tools:
Common pain points:
datacell is a single, fast CLI tool that:
| Feature | datacell | pandas | csvkit | xsv | Excel |
|---|---|---|---|---|---|
| Single binary | ✅ | ❌ | ❌ | ✅ | ❌ |
| CSV support | ✅ | ✅ | ✅ | ✅ | ✅ |
| Excel support | ✅ | ✅ | ❌ | ❌ | ✅ |
| Parquet/Avro | ✅ | ✅ | ❌ | ❌ | ❌ |
| Formulas | ✅ | ❌ | ❌ | ❌ | ✅ |
| CLI-native | ✅ | ❌ | ✅ | ✅ | ❌ |
| Fast startup | ✅ | ❌ | ❌ | ✅ | ❌ |
| Scriptable | ✅ | ✅ | ✅ | ✅ | ❌ |
| No runtime dependencies | ✅ | ❌ | ✅ | ✅ | ❌ |
# Build
cargo build --release
# Convert CSV to Parquet
datacell convert --input data.csv --output data.parquet
# Apply formula
datacell formula --input sales.csv --output result.csv --formula "SUM(C2:C100)" --cell "D1"
# Filter and sort (SQL-like syntax)
datacell filter --input data.csv --output filtered.csv --where "status = 'active'"
datacell sort --input filtered.csv --output sorted.csv --column date --descending
# Output as JSON for API consumption
datacell read --input report.xlsx --format json > report.json
.datacell.toml for default optionscargo build --release
The binary will be available at target/release/datacell.
# Read CSV
datacell read --input data.csv
# Read Excel (first sheet)
datacell read --input data.xlsx
# Read specific sheet
datacell read --input data.xlsx --sheet "Sheet2"
# Read specific cell range
datacell read --input data.csv --range "A1:C10"
# Read as JSON
datacell read --input data.csv --format json
# Read range as JSON
datacell read --input data.xlsx --range "B2:D5" --format json
# Read as Markdown table
datacell read --input data.csv --format markdown
# Read Parquet file
datacell read --input data.parquet
# Read Avro file
datacell read --input data.avro
# Write CSV from CSV
datacell write --output output.csv --csv input.csv
# Write Excel from CSV
datacell write --output output.xlsx --csv input.csv
# Write Parquet from CSV
datacell write --output output.parquet --csv input.csv
# Write Avro from CSV
datacell write --output output.avro --csv input.csv
# Write Excel with specific sheet name
datacell write --output output.xlsx --csv input.csv --sheet "Data"
Supports conversion between: CSV, XLSX, XLS, ODS, Parquet, Avro
# CSV to Excel
datacell convert --input data.csv --output data.xlsx
# Excel to CSV
datacell convert --input data.xlsx --output data.csv
# Excel to CSV (specific sheet)
datacell convert --input data.xlsx --output data.csv --sheet "Sheet2"
# CSV to Parquet
datacell convert --input data.csv --output data.parquet
# Parquet to CSV
datacell convert --input data.parquet --output data.csv
# Excel to Avro
datacell convert --input data.xlsx --output data.avro
# Avro to Parquet
datacell convert --input data.avro --output data.parquet
# ODS to CSV
datacell convert --input data.ods --output data.csv
# Apply SUM formula to CSV
datacell formula --input data.csv --output result.csv --formula "SUM(A1:A10)" --cell "C1"
# Apply arithmetic formula
datacell formula --input data.csv --output result.csv --formula "A1+B1" --cell "C1"
# Apply AVERAGE formula to Excel
datacell formula --input data.xlsx --output result.xlsx --formula "AVERAGE(A1:A10)" --cell "B1" --sheet "Sheet1"
# Sort by column A (ascending)
datacell sort --input data.csv --output sorted.csv --column A
# Sort by column B (descending)
datacell sort --input data.csv --output sorted.csv --column B --descending
# Filter rows using SQL-like WHERE clause
datacell filter --input data.csv --output filtered.csv --where "A > 10"
datacell filter --input data.csv --output filtered.csv --where "status = 'active'"
# Find and replace
datacell replace --input data.csv --output replaced.csv --find "old" --replace "new"
# Remove duplicate rows
datacell dedupe --input data.csv --output unique.csv
# Transpose (rows to columns)
datacell transpose --input data.csv --output transposed.csv
# Append data from one file to another
datacell append --source new_data.csv --target existing.csv
# List sheets in Excel/ODS file
datacell sheets --input workbook.xlsx
# Read all sheets at once (as JSON)
datacell read-all --input workbook.xlsx --format json
# Write data to specific cell range
datacell write-range --input data.csv --output result.xlsx --start B2
# First/last n rows
datacell head --input data.csv -n 5
datacell tail --input data.csv -n 5
# Sample random rows
datacell sample --input data.csv -n 10 --seed 42
# Select specific columns
datacell select --input data.csv --output subset.csv --columns "name,age,salary"
# Describe statistics
datacell describe --input data.csv --format markdown
# Value counts
datacell value-counts --input data.csv --column category
# Group by and aggregate
datacell groupby --input sales.csv --output summary.csv --by category --agg "sum:amount,count:id,mean:price"
# Join two files
datacell join --left orders.csv --right customers.csv --output merged.csv --on customer_id --how left
# Concatenate files
datacell concat --inputs "jan.csv,feb.csv,mar.csv" --output q1.csv
# Fill empty values
datacell fillna --input data.csv --output filled.csv --value "N/A"
# Drop rows with empty values
datacell dropna --input data.csv --output clean.csv
# Drop columns
datacell drop --input data.csv --output slim.csv --columns "temp,debug"
# Rename columns
datacell rename --input data.csv --output renamed.csv --from "old_name" --to "new_name"
# Pivot table
datacell pivot --input sales.csv --output pivot.csv --index Category --columns Product --values Price --agg sum
# Correlation matrix
datacell corr --input data.csv --columns "Price,Quantity" --format markdown
# Show column types
datacell dtypes --input data.csv --format markdown
# SQL-like query
datacell query --input data.csv --output filtered.csv -w "Price > 100"
# Add computed column
datacell mutate --input data.csv --output result.csv --column Total --formula "Price * Quantity"
# Cast column type
datacell astype --input data.csv --output result.csv --column Price -t int
# Get unique values
datacell unique --input data.csv --column Category
# Dataset info
datacell info --input data.csv --format markdown
# Clip values to range
datacell clip --input data.csv --output clipped.csv --column Price --min 0 --max 1000
# Normalize column (0-1)
datacell normalize --input data.csv --output normalized.csv --column Price
# Parse and reformat dates
datacell parse-date --input data.csv --output result.csv --column Date --from-format "%Y-%m-%d" --to-format "%d/%m/%Y"
# Filter with regex
datacell regex-filter --input data.csv --output filtered.csv --column Name --pattern "^[A-M]"
# Replace with regex
datacell regex-replace --input data.csv --output result.csv --column Category --pattern "Electronics" --replacement "Tech"
# Batch process multiple files
datacell batch --inputs "data/*.csv" --output-dir processed/ --operation sort --args '{"column":"Price","desc":true}'
# Generate shell completions
datacell completions zsh >> ~/.zshrc
datacell completions bash >> ~/.bashrc
datacell completions fish > ~/.config/fish/completions/datacell.fish
# Initialize config file
datacell config-init --output .datacell.toml
# Export to Excel with styling
datacell export-styled --input data.csv --output styled.xlsx --header-bg 4472C4 --header-font FFFFFF
# Create charts
datacell chart --input sales.csv --output chart.xlsx -t column --title "Sales by Product"
datacell chart --input data.csv --output multi.xlsx -t bar --value-cols "1,2,3" --title "Comparison"
datacell chart --input data.csv --output pie.xlsx -t pie --title "Distribution"
SUM(A1:A10) - Sum of cells A1 through A10AVERAGE(A1:A10) - Average of cells A1 through A10MIN(A1:A10) - Minimum value in rangeMAX(A1:A10) - Maximum value in rangeCOUNT(A1:A10) - Count of numeric cells in rangeROUND(A1, 2) - Round to 2 decimal placesABS(A1) - Absolute valueLEN(A1) - Length of text in cellVLOOKUP(2, A1:C10, 3) - Lookup value in tableSUMIF(A1:A10, ">5", B1:B10) - Sum cells matching criteriaCOUNTIF(A1:A10, ">5") - Count cells matching criteriaIF(A1>10, "High", "Low") - Conditional logicCONCAT(A1, " ", B1) - String concatenationA1+B1 - Add values in A1 and B1A1-B1 - Subtract B1 from A1A1*B1 - Multiply A1 by B1A1/B1 - Divide A1 by B1A1 - Reference a single cell# Daily ETL: Excel → Parquet for analytics
datacell convert --input daily_report.xlsx --output data/daily_$(date +%Y%m%d).parquet
# Calculate totals and output as Markdown for documentation
datacell formula --input sales.csv --output report.csv --formula "SUM(D2:D100)" --cell "E1"
datacell read --input report.csv --format markdown > REPORT.md
# Remove duplicates, filter invalid rows, sort
datacell dedupe --input raw.csv --output clean.csv
datacell filter --input clean.csv --output valid.csv --column status --op "!=" --value "invalid"
datacell sort --input valid.csv --output final.csv --column date
# Migrate legacy Excel files to modern Parquet
for f in *.xlsx; do
datacell convert --input "$f" --output "${f%.xlsx}.parquet"
done
# Start MCP server for AI assistant integration
datacell serve
See the examples/ folder for sample data files and usage examples.
datacell/
├── src/
│ ├── main.rs # CLI entry point
│ ├── lib.rs # Library exports
│ ├── cli.rs # CLI commands + handlers
│ ├── traits.rs # Core trait definitions
│ ├── excel/ # Excel/ODS file handling
│ ├── csv_handler.rs # CSV file handling
│ ├── columnar.rs # Parquet/Avro handling
│ ├── converter.rs # Format conversion
│ ├── formula/ # Formula evaluation
│ ├── operations/ # Data operations (sort, filter, etc.)
│ ├── common.rs # Shared utilities
│ ├── config.rs # Configuration file support
│ ├── validation.rs # Data validation
│ ├── profiling.rs # Data profiling
│ ├── quality.rs # Data quality reports
│ ├── text_analysis.rs # Text analysis
│ ├── timeseries.rs # Time series operations
│ ├── geospatial.rs # Geospatial calculations
│ ├── anomaly.rs # Anomaly detection
│ ├── encryption.rs # File encryption
│ ├── workflow.rs # Workflow orchestration
│ ├── api.rs # REST API server
│ ├── plugins.rs # Plugin system
│ ├── streaming.rs # Streaming data processing
│ └── mcp.rs # MCP server for AI integration
├── tests/ # Integration tests
├── examples/ # Sample data files
├── CLAUDE.md # AI assistant guide
└── Cargo.toml
# Validate data against rules
datacell validate --input data.csv --rules rules.json --output validated.csv --report report.json
# Generate data profile and quality report
datacell profile --input data.csv --output profile.json --report quality_report.md
# Analyze text content
datacell text-analysis --input data.csv --column content --operation stats
datacell text-analysis --input data.csv --column content --operation sentiment
datacell text-analysis --input data.csv --column content --operation keywords
# Resample time series data
datacell resample --input data.csv --output resampled.csv --date-column date --value-column value --interval daily --aggregation sum
# Calculate distance between coordinates
datacell geo-distance --from "40.7128,-74.0060" --to "34.0522,-118.2437" --unit km
# Detect anomalies using statistical methods
datacell detect-anomalies --input data.csv --column value --method zscore --threshold 3.0 --output anomalies.json
# Encrypt/decrypt data files
datacell encrypt --input data.csv --output encrypted.csv --key secretkey --algorithm aes256
datacell decrypt --input encrypted.csv --output decrypted.csv --key secretkey --algorithm aes256
# Execute multi-step workflow pipeline
datacell pipeline --config pipeline.toml
# Start REST API server (requires HTTP framework implementation)
datacell api-server --host 127.0.0.1 --port 8080 --cors
| Crate | Purpose |
|---|---|
clap |
CLI argument parsing with derive macros |
calamine |
Excel/ODS reading |
rust_xlsxwriter |
Excel writing (XLSX only) |
csv |
CSV handling |
parquet + arrow |
Parquet support (v54) |
apache-avro |
Avro support |
rmcp |
MCP server implementation (v0.12) |
tokio |
Async runtime |
serde_json |
JSON output |
regex |
Pattern matching |
chrono |
Date/time handling |
anyhow |
Error handling |
thiserror |
Error types |
# Run all tests
cargo test
# Run specific test
cargo test test_read_xlsx
# Run with output
cargo test -- --nocapture
# Generate test coverage
cargo install cargo-tarpaulin
cargo tarpaulin --out Html
Create a .datacell.toml file for default options:
datacell config-init --output .datacell.toml
Example configuration:
[excel]
header_bold = true
header_bg_color = "4472C4"
header_font_color = "FFFFFF"
auto_filter = true
freeze_header = true
auto_fit = true
[output]
default_format = "csv"
include_headers = true
Apache-2.0
See CLAUDE.md for development guidance and architectural details.