datacell

Crates.io	datacell
lib.rs	datacell
version	0.1.3
created_at	2025-12-08 17:27:20.80947+00
updated_at	2026-01-22 17:29:16.868042+00
description	A CLI tool for reading, writing, converting XLS and CSV files with formula support
homepage
repository	https://github.com/yingkitw/datacell
max_upload_size
id	1974097
size	637,322

Ying Kit WONG (yingkitw)

documentation

README

datacell

A fast, unified CLI tool for spreadsheet and columnar data manipulation.

The Problem

Working with tabular data often requires juggling multiple tools:

Excel/LibreOffice - GUI-only, slow for batch processing, no scripting
pandas/Python - Requires Python environment, slow startup, memory-heavy
csvkit - CSV-only, no Excel/Parquet/Avro support
xsv - Fast but CSV-only, no formulas
Apache Spark - Overkill for simple transformations, complex setup

Common pain points:

Converting between formats requires different tools
Applying Excel-like formulas to CSV files is awkward
Batch processing spreadsheets in CI/CD pipelines is difficult
No single tool handles CSV, Excel, Parquet, and Avro uniformly

The Solution

datacell is a single, fast CLI tool that:

Reads/writes all major formats: CSV, XLSX, XLS, ODS, Parquet, Avro
Applies Excel-like formulas to any format (SUM, VLOOKUP, IF, etc.)
Performs data operations without code (sort, filter, dedupe, transpose)
Converts between any formats with one command
Outputs to JSON/Markdown for easy integration
Runs as an MCP server for AI assistant integration
Provides pandas-style operations for data manipulation

Why datacell?

Feature	datacell	pandas	csvkit	xsv	Excel
Single binary	✅	❌	❌	✅	❌
CSV support	✅	✅	✅	✅	✅
Excel support	✅	✅	❌	❌	✅
Parquet/Avro	✅	✅	❌	❌	❌
Formulas	✅	❌	❌	❌	✅
CLI-native	✅	❌	✅	✅	❌
Fast startup	✅	❌	❌	✅	❌
Scriptable	✅	✅	✅	✅	❌
No runtime dependencies	✅	❌	✅	✅	❌

Quick Start

# Build
cargo build --release

# Convert CSV to Parquet
datacell convert --input data.csv --output data.parquet

# Apply formula
datacell formula --input sales.csv --output result.csv --formula "SUM(C2:C100)" --cell "D1"

# Filter and sort (SQL-like syntax)
datacell filter --input data.csv --output filtered.csv --where "status = 'active'"
datacell sort --input filtered.csv --output sorted.csv --column date --descending

# Output as JSON for API consumption
datacell read --input report.xlsx --format json > report.json

Features

Read XLSX, ODS, CSV, Parquet, and Avro files
Write data to XLSX, CSV, Parquet, and Avro files
Convert between any formats (CSV, XLSX, ODS, Parquet, Avro)
Apply formulas to cells in both CSV and Excel files
- Supports basic arithmetic operations (+, -, *, /)
- Supports SUM(), AVERAGE(), MIN(), MAX(), COUNT() functions
- Supports ROUND(), ABS(), LEN() functions
- Supports VLOOKUP(), SUMIF(), COUNTIF() functions
- Supports IF() for conditional logic
- Supports CONCAT() for string concatenation
- Supports cell references (e.g., A1, B2)
Data operations
- Sort rows by column (ascending/descending)
- Filter rows by condition
- Find and replace values
- Remove duplicate rows
- Transpose data (rows to columns)
- Merge cells (Excel output)
Pandas-style operations
- Head/tail (first/last n rows)
- Sample random rows
- Select/drop columns
- Describe (summary statistics)
- Value counts
- Group by with aggregations (sum, count, mean, min, max)
- Pivot tables
- Join/merge files (inner, left, right, outer)
- Concatenate files
- Fill/drop missing values
- Rename columns
- Correlation matrix
- Column type inference (dtypes)
- Type casting (astype)
- Unique values
- Dataset info
- SQL-like query with WHERE clause
- Computed columns (mutate)
- Value clipping
- Column normalization (0-1)
- Date parsing and formatting
- Regex filter and replace
Batch processing - process multiple files with glob patterns
Shell completions - bash, zsh, fish, powershell
Config file - .datacell.toml for default options
Styled Excel export - headers, colors, borders, freeze panes, auto-filter
Chart visualization - bar, column, line, area, pie, scatter, doughnut charts
Cell range operations - read/write specific ranges like A1:C10
Multiple output formats - CSV, JSON, Markdown
Multi-sheet support - list sheets, read all sheets at once
Streaming API - process large files efficiently
Progress callbacks - track long-running operations
MCP server for integration with AI assistants

Installation

cargo build --release

The binary will be available at target/release/datacell.

Usage

Read a file

# Read CSV
datacell read --input data.csv

# Read Excel (first sheet)
datacell read --input data.xlsx

# Read specific sheet
datacell read --input data.xlsx --sheet "Sheet2"

# Read specific cell range
datacell read --input data.csv --range "A1:C10"

# Read as JSON
datacell read --input data.csv --format json

# Read range as JSON
datacell read --input data.xlsx --range "B2:D5" --format json

# Read as Markdown table
datacell read --input data.csv --format markdown

# Read Parquet file
datacell read --input data.parquet

# Read Avro file
datacell read --input data.avro

Write a file

# Write CSV from CSV
datacell write --output output.csv --csv input.csv

# Write Excel from CSV
datacell write --output output.xlsx --csv input.csv

# Write Parquet from CSV
datacell write --output output.parquet --csv input.csv

# Write Avro from CSV
datacell write --output output.avro --csv input.csv

# Write Excel with specific sheet name
datacell write --output output.xlsx --csv input.csv --sheet "Data"

Convert between formats

Supports conversion between: CSV, XLSX, XLS, ODS, Parquet, Avro

# CSV to Excel
datacell convert --input data.csv --output data.xlsx

# Excel to CSV
datacell convert --input data.xlsx --output data.csv

# Excel to CSV (specific sheet)
datacell convert --input data.xlsx --output data.csv --sheet "Sheet2"

# CSV to Parquet
datacell convert --input data.csv --output data.parquet

# Parquet to CSV
datacell convert --input data.parquet --output data.csv

# Excel to Avro
datacell convert --input data.xlsx --output data.avro

# Avro to Parquet
datacell convert --input data.avro --output data.parquet

# ODS to CSV
datacell convert --input data.ods --output data.csv

Apply formulas

# Apply SUM formula to CSV
datacell formula --input data.csv --output result.csv --formula "SUM(A1:A10)" --cell "C1"

# Apply arithmetic formula
datacell formula --input data.csv --output result.csv --formula "A1+B1" --cell "C1"

# Apply AVERAGE formula to Excel
datacell formula --input data.xlsx --output result.xlsx --formula "AVERAGE(A1:A10)" --cell "B1" --sheet "Sheet1"

Data operations

# Sort by column A (ascending)
datacell sort --input data.csv --output sorted.csv --column A

# Sort by column B (descending)
datacell sort --input data.csv --output sorted.csv --column B --descending

# Filter rows using SQL-like WHERE clause
datacell filter --input data.csv --output filtered.csv --where "A > 10"
datacell filter --input data.csv --output filtered.csv --where "status = 'active'"

# Find and replace
datacell replace --input data.csv --output replaced.csv --find "old" --replace "new"

# Remove duplicate rows
datacell dedupe --input data.csv --output unique.csv

# Transpose (rows to columns)
datacell transpose --input data.csv --output transposed.csv

# Append data from one file to another
datacell append --source new_data.csv --target existing.csv

# List sheets in Excel/ODS file
datacell sheets --input workbook.xlsx

# Read all sheets at once (as JSON)
datacell read-all --input workbook.xlsx --format json

# Write data to specific cell range
datacell write-range --input data.csv --output result.xlsx --start B2

Pandas-style operations

# First/last n rows
datacell head --input data.csv -n 5
datacell tail --input data.csv -n 5

# Sample random rows
datacell sample --input data.csv -n 10 --seed 42

# Select specific columns
datacell select --input data.csv --output subset.csv --columns "name,age,salary"

# Describe statistics
datacell describe --input data.csv --format markdown

# Value counts
datacell value-counts --input data.csv --column category

# Group by and aggregate
datacell groupby --input sales.csv --output summary.csv --by category --agg "sum:amount,count:id,mean:price"

# Join two files
datacell join --left orders.csv --right customers.csv --output merged.csv --on customer_id --how left

# Concatenate files
datacell concat --inputs "jan.csv,feb.csv,mar.csv" --output q1.csv

# Fill empty values
datacell fillna --input data.csv --output filled.csv --value "N/A"

# Drop rows with empty values
datacell dropna --input data.csv --output clean.csv

# Drop columns
datacell drop --input data.csv --output slim.csv --columns "temp,debug"

# Rename columns
datacell rename --input data.csv --output renamed.csv --from "old_name" --to "new_name"

# Pivot table
datacell pivot --input sales.csv --output pivot.csv --index Category --columns Product --values Price --agg sum

# Correlation matrix
datacell corr --input data.csv --columns "Price,Quantity" --format markdown

# Show column types
datacell dtypes --input data.csv --format markdown

# SQL-like query
datacell query --input data.csv --output filtered.csv -w "Price > 100"

# Add computed column
datacell mutate --input data.csv --output result.csv --column Total --formula "Price * Quantity"

# Cast column type
datacell astype --input data.csv --output result.csv --column Price -t int

# Get unique values
datacell unique --input data.csv --column Category

# Dataset info
datacell info --input data.csv --format markdown

# Clip values to range
datacell clip --input data.csv --output clipped.csv --column Price --min 0 --max 1000

# Normalize column (0-1)
datacell normalize --input data.csv --output normalized.csv --column Price

# Parse and reformat dates
datacell parse-date --input data.csv --output result.csv --column Date --from-format "%Y-%m-%d" --to-format "%d/%m/%Y"

# Filter with regex
datacell regex-filter --input data.csv --output filtered.csv --column Name --pattern "^[A-M]"

# Replace with regex
datacell regex-replace --input data.csv --output result.csv --column Category --pattern "Electronics" --replacement "Tech"

# Batch process multiple files
datacell batch --inputs "data/*.csv" --output-dir processed/ --operation sort --args '{"column":"Price","desc":true}'

# Generate shell completions
datacell completions zsh >> ~/.zshrc
datacell completions bash >> ~/.bashrc
datacell completions fish > ~/.config/fish/completions/datacell.fish

# Initialize config file
datacell config-init --output .datacell.toml

# Export to Excel with styling
datacell export-styled --input data.csv --output styled.xlsx --header-bg 4472C4 --header-font FFFFFF

# Create charts
datacell chart --input sales.csv --output chart.xlsx -t column --title "Sales by Product"
datacell chart --input data.csv --output multi.xlsx -t bar --value-cols "1,2,3" --title "Comparison"
datacell chart --input data.csv --output pie.xlsx -t pie --title "Distribution"

Formula Examples

SUM(A1:A10) - Sum of cells A1 through A10
AVERAGE(A1:A10) - Average of cells A1 through A10
MIN(A1:A10) - Minimum value in range
MAX(A1:A10) - Maximum value in range
COUNT(A1:A10) - Count of numeric cells in range
ROUND(A1, 2) - Round to 2 decimal places
ABS(A1) - Absolute value
LEN(A1) - Length of text in cell
VLOOKUP(2, A1:C10, 3) - Lookup value in table
SUMIF(A1:A10, ">5", B1:B10) - Sum cells matching criteria
COUNTIF(A1:A10, ">5") - Count cells matching criteria
IF(A1>10, "High", "Low") - Conditional logic
CONCAT(A1, " ", B1) - String concatenation
A1+B1 - Add values in A1 and B1
A1-B1 - Subtract B1 from A1
A1*B1 - Multiply A1 by B1
A1/B1 - Divide A1 by B1
A1 - Reference a single cell

Use Cases

Data Pipeline Automation

# Daily ETL: Excel → Parquet for analytics
datacell convert --input daily_report.xlsx --output data/daily_$(date +%Y%m%d).parquet

Report Generation

# Calculate totals and output as Markdown for documentation
datacell formula --input sales.csv --output report.csv --formula "SUM(D2:D100)" --cell "E1"
datacell read --input report.csv --format markdown > REPORT.md

Data Cleaning

# Remove duplicates, filter invalid rows, sort
datacell dedupe --input raw.csv --output clean.csv
datacell filter --input clean.csv --output valid.csv --column status --op "!=" --value "invalid"
datacell sort --input valid.csv --output final.csv --column date

Format Migration

# Migrate legacy Excel files to modern Parquet
for f in *.xlsx; do
  datacell convert --input "$f" --output "${f%.xlsx}.parquet"
done

AI/LLM Integration

# Start MCP server for AI assistant integration
datacell serve

Example Data

See the examples/ folder for sample data files and usage examples.

Architecture

datacell/
├── src/
│   ├── main.rs          # CLI entry point
│   ├── lib.rs           # Library exports
│   ├── cli.rs           # CLI commands + handlers
│   ├── traits.rs        # Core trait definitions
│   ├── excel/           # Excel/ODS file handling
│   ├── csv_handler.rs   # CSV file handling
│   ├── columnar.rs      # Parquet/Avro handling
│   ├── converter.rs     # Format conversion
│   ├── formula/         # Formula evaluation
│   ├── operations/      # Data operations (sort, filter, etc.)
│   ├── common.rs        # Shared utilities
│   ├── config.rs        # Configuration file support
│   ├── validation.rs    # Data validation
│   ├── profiling.rs     # Data profiling
│   ├── quality.rs       # Data quality reports
│   ├── text_analysis.rs # Text analysis
│   ├── timeseries.rs    # Time series operations
│   ├── geospatial.rs    # Geospatial calculations
│   ├── anomaly.rs       # Anomaly detection
│   ├── encryption.rs    # File encryption
│   ├── workflow.rs      # Workflow orchestration
│   ├── api.rs           # REST API server
│   ├── plugins.rs       # Plugin system
│   ├── streaming.rs     # Streaming data processing
│   └── mcp.rs           # MCP server for AI integration
├── tests/               # Integration tests
├── examples/            # Sample data files
├── CLAUDE.md            # AI assistant guide
└── Cargo.toml

Advanced Features

Data Validation

# Validate data against rules
datacell validate --input data.csv --rules rules.json --output validated.csv --report report.json

Data Profiling

# Generate data profile and quality report
datacell profile --input data.csv --output profile.json --report quality_report.md

Text Analysis

# Analyze text content
datacell text-analysis --input data.csv --column content --operation stats
datacell text-analysis --input data.csv --column content --operation sentiment
datacell text-analysis --input data.csv --column content --operation keywords

Time Series Operations

# Resample time series data
datacell resample --input data.csv --output resampled.csv --date-column date --value-column value --interval daily --aggregation sum

Geospatial Operations

# Calculate distance between coordinates
datacell geo-distance --from "40.7128,-74.0060" --to "34.0522,-118.2437" --unit km

Anomaly Detection

# Detect anomalies using statistical methods
datacell detect-anomalies --input data.csv --column value --method zscore --threshold 3.0 --output anomalies.json

Data Encryption

# Encrypt/decrypt data files
datacell encrypt --input data.csv --output encrypted.csv --key secretkey --algorithm aes256
datacell decrypt --input encrypted.csv --output decrypted.csv --key secretkey --algorithm aes256

Workflow Orchestration

# Execute multi-step workflow pipeline
datacell pipeline --config pipeline.toml

REST API Server

# Start REST API server (requires HTTP framework implementation)
datacell api-server --host 127.0.0.1 --port 8080 --cors

Dependencies

Crate	Purpose
`clap`	CLI argument parsing with derive macros
`calamine`	Excel/ODS reading
`rust_xlsxwriter`	Excel writing (XLSX only)
`csv`	CSV handling
`parquet` + `arrow`	Parquet support (v54)
`apache-avro`	Avro support
`rmcp`	MCP server implementation (v0.12)
`tokio`	Async runtime
`serde_json`	JSON output
`regex`	Pattern matching
`chrono`	Date/time handling
`anyhow`	Error handling
`thiserror`	Error types

Testing

# Run all tests
cargo test

# Run specific test
cargo test test_read_xlsx

# Run with output
cargo test -- --nocapture

# Generate test coverage
cargo install cargo-tarpaulin
cargo tarpaulin --out Html

Configuration

Create a .datacell.toml file for default options:

datacell config-init --output .datacell.toml

Example configuration:

[excel]
header_bold = true
header_bg_color = "4472C4"
header_font_color = "FFFFFF"
auto_filter = true
freeze_header = true
auto_fit = true

[output]
default_format = "csv"
include_headers = true

License

Apache-2.0

Contributing

See CLAUDE.md for development guidance and architectural details.

Commit count: 13

datacell

documentation

README

datacell

The Problem

The Solution

Why datacell?

Quick Start

Features

Installation

Usage

Read a file

Write a file

Convert between formats

Apply formulas

Data operations

Pandas-style operations

Formula Examples

Use Cases

Data Pipeline Automation

Report Generation

Data Cleaning

Format Migration

AI/LLM Integration

Example Data

Architecture

Advanced Features

Data Validation

Data Profiling

Text Analysis

Time Series Operations

Geospatial Operations

Anomaly Detection

Data Encryption

Workflow Orchestration

REST API Server

Dependencies

Testing

Configuration

License

Contributing

cargo fmt