| Crates.io | data-transform |
| lib.rs | data-transform |
| version | 0.1.2 |
| created_at | 2025-12-29 17:29:24.725765+00 |
| updated_at | 2026-01-01 18:10:04.30099+00 |
| description | A data transformation tool for working with tabular data |
| homepage | https://github.com/system0x7/dt |
| repository | https://github.com/system0x7/dt |
| max_upload_size | |
| id | 2011012 |
| size | 249,522 |
A fast, readable data transformation tool for working with tabular data. Built with Rust and Polars.
Clearer than pandas, faster than awk, zero setup.
# Shell installer (macOS/Linux)
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/system0x7/dt/releases/latest/download/data-transform-installer.sh | sh
# Cargo
cargo install data-transform
# From source
cargo install --git https://github.com/system0x7/dt
$ dt
>> data = read('sales.csv')
Stored: data (7 rows × 4 cols)
[Table: 7 rows × 4 cols]
shape: (5, 4)
┌──────────┬─────────────┬────────┬──────────┐
│ product ┆ category ┆ price ┆ quantity │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ i64 │
╞══════════╪═════════════╪════════╪══════════╡
│ Laptop ┆ Electronics ┆ 899.99 ┆ 5 │
│ Mouse ┆ Electronics ┆ 24.99 ┆ 120 │
│ Desk ┆ Furniture ┆ 299.0 ┆ 8 │
│ Chair ┆ Furniture ┆ 159.99 ┆ 15 │
│ Keyboard ┆ Electronics ┆ 79.99 ┆ 45 │
└──────────┴─────────────┴────────┴──────────┘
... 2 more rows
>> data = data | filter(price > 100)
Stored: data (5 rows × 4 cols)
[Table: 5 rows × 4 cols]
shape: (5, 4)
┌───────────┬─────────────┬────────┬──────────┐
│ product ┆ category ┆ price ┆ quantity │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ i64 │
╞═══════════╪═════════════╪════════╪══════════╡
│ Laptop ┆ Electronics ┆ 899.99 ┆ 5 │
│ Desk ┆ Furniture ┆ 299.0 ┆ 8 │
│ Chair ┆ Furniture ┆ 159.99 ┆ 15 │
│ Monitor ┆ Electronics ┆ 349.99 ┆ 12 │
│ Bookshelf ┆ Furniture ┆ 189.0 ┆ 6 │
└───────────┴─────────────┴────────┴──────────┘
>> data = data | mutate(total = price * quantity)
Stored: data (5 rows × 5 cols)
[Table: 5 rows × 5 cols]
shape: (5, 5)
┌───────────┬─────────────┬────────┬──────────┬─────────┐
│ product ┆ category ┆ price ┆ quantity ┆ total │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ i64 ┆ f64 │
╞═══════════╪═════════════╪════════╪══════════╪═════════╡
│ Laptop ┆ Electronics ┆ 899.99 ┆ 5 ┆ 4499.95 │
│ Desk ┆ Furniture ┆ 299.0 ┆ 8 ┆ 2392.0 │
│ Chair ┆ Furniture ┆ 159.99 ┆ 15 ┆ 2399.85 │
│ Monitor ┆ Electronics ┆ 349.99 ┆ 12 ┆ 4199.88 │
│ Bookshelf ┆ Furniture ┆ 189.0 ┆ 6 ┆ 1134.0 │
└───────────┴─────────────┴────────┴──────────┴─────────┘
>> data = data | select(product, category, total)
Stored: data (5 rows × 3 cols)
[Table: 5 rows × 3 cols]
shape: (5, 3)
┌───────────┬─────────────┬─────────┐
│ product ┆ category ┆ total │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 │
╞═══════════╪═════════════╪═════════╡
│ Laptop ┆ Electronics ┆ 4499.95 │
│ Desk ┆ Furniture ┆ 2392.0 │
│ Chair ┆ Furniture ┆ 2399.85 │
│ Monitor ┆ Electronics ┆ 4199.88 │
│ Bookshelf ┆ Furniture ┆ 1134.0 │
└───────────┴─────────────┴─────────┘
>> data | write('revenue.csv')
Written: revenue.csv (5 rows × 3 cols)
The REPL shows the table after each operation, letting you verify transformations before saving.
dt "read('data.csv') | filter(price > 100) | write('output.csv')"
dt -f transform.dt
# Load reference data
pops = read('pops.txt', header=false)
ref = read('reference.ind', header=false)
fam = read('reference.fam', header=false)
# Filter and join
keep = ref | filter($3 in pops)
result = fam | filter($2 in keep) | select($1, $2)
# Save results
result | write('output.tsv') # .tsv extension auto-uses tab delimiter
select($1, $2, $3) # By position (1-based)
select($1..$5) # Range (inclusive)
select(name, age, email) # By name
select($1 as id) # With renaming
drop($3..$7) # Remove columns
filter(age > 30)
filter(name == "Alice")
filter($3 in populations)
sort(age desc)
distinct(user_id)
mutate(total = price * quantity)
mutate(full_name = first + " " + last)
mutate(domain = split(email, '@')[1])
mutate(label = lookup(labels, id, on='id', return='name'))
mutate(clean = replace(text, 'old', 'new'))
mutate(parts = split(id, ':')[0])
mutate(combined = $1 + ':' + $2)
rename(old_name -> new_name)
rename_all(replace('_', '-'))
rename_all('PC' + 1..50) # PC1, PC2, ..., PC50
# Load reference table
labels = read('labels.csv')
# Lookup values (single-line)
data = read('samples.csv') | mutate(population = lookup(labels, sample_id, on='id', return='pop'), region = lookup(labels, sample_id, on='id', return='region'))
# Or split at pipe boundaries for readability
data = read('samples.csv') |
mutate(population = lookup(labels, sample_id, on='id', return='pop'), region = lookup(labels, sample_id, on='id', return='region'))
See REFERENCE for complete syntax and examples.
.json) - Structured JSON data.parquet) - Columnar formatDelimited text files - Delimiter auto-detected for any file:
.csv - Defaults to comma, auto-detects if ambiguous.tsv - Defaults to tab, auto-detects if ambiguous.txt, .dat, .psv, etc.) - Auto-detects delimiterAuto-detection analyzes file content and identifies: comma, tab, pipe, semicolon, or space.
Override auto-detection if needed:
read('data.txt', delimiter=' ')
read('data.psv', delimiter='|')
Built on Polars, dt provides:
For typical data transformation tasks, dt is 5-10x faster than awk while being significantly more readable.
.help - Show help.schema - Show current table schema.vars - Show stored variables.history - Show operation history.undo [n] - Undo operations.clear - Clear current state.exit - Exit REPLMIT