| Crates.io | pirkle |
| lib.rs | pirkle |
| version | 0.4.8 |
| created_at | 2025-04-26 20:43:43.043516+00 |
| updated_at | 2025-05-24 20:15:03.86583+00 |
| description | A CLI tool to query files using PRQL |
| homepage | https://github.com/dloss/pirkle |
| repository | https://github.com/dloss/pirkle |
| max_upload_size | |
| id | 1650593 |
| size | 148,237 |
Pirkle is a fast, lightweight command-line tool that brings the power of PRQL (Pipelined Relational Query Language) to CSV and SQLite files. Transform, filter, and join your data with expressive, readable queries that compile to optimized SQL.
Why Pirkle?
Download the latest release for your platform:
| Platform | Download |
|---|---|
| Windows | pirkle-x86_64-pc-windows-msvc.zip |
| macOS (Apple Silicon) | pirkle-aarch64-apple-darwin.tar.gz |
| macOS (Intel) | pirkle-x86_64-apple-darwin.tar.gz |
| Linux (x86_64) | pirkle-x86_64-unknown-linux-musl.tar.gz |
# Cargo (Rust)
cargo install pirkle
# Homebrew (coming soon)
# brew install pirkle
Install using Rust:
git clone https://github.com/dloss/pirkle.git
cd pirkle
cargo install --path .
# Install pirkle
cargo install pirkle
# Query a CSV file
pirkle examples/data.csv --query "from data | filter price > 100 | select {name, price}"
# View file structure
pirkle examples/data.csv --schema
# Query a CSV file. CSV files are auto-loaded as SQLite tables.
$ pirkle examples/employees.csv --query "from employees | filter country == 'USA' | select {name, age}"
name age
---------------------
John Smith 32
Robert Johnson 41
James Brown 39
# Query a SQLite file
$ pirkle examples/company.sqlite --query "from employees | select {name, age} | take 5"
name age
---------------------
John Smith 32
Maria Garcia 28
Robert Johnson 41
Lisa Wang 35
Ahmed Hassan 29
# Alternative syntax using -- delimiter
$ pirkle examples/employees.csv -- "from employees | filter department == 'Engineering' | select {name, age}"
name age
---------------------
John Smith 32
Robert Johnson 41
Ahmed Hassan 29
Sarah Kim 31
Pirkle supports reading CSV data from standard input, making it easy to pipe data from other commands:
# Pipe data into pirkle
$ cat examples/employees.csv | pirkle stdin --query "from stdin | filter salary > 70000"
id name department age salary country
------------------------------------------------------------
1 John Smith Engineering 32 85000 USA
3 Robert Johnson Engineering 41 92000 USA
5 Ahmed Hassan Engineering 29 75000 Egypt
8 Sarah Kim Engineering 31 83000 South Korea
9 James Brown Sales 39 85000 USA
10 Fatima Al-Farsi Marketing 36 76000 UAE
# Use stdin with files
$ cat examples/orders.csv | pirkle stdin examples/customers.csv --query "from stdin | join customers (==customer_id)"
order_id customer_id amount region customer_id name region
------------------------------------------------------------------------------
1 100 250 North 100 Acme Corp North
2 101 300 South 101 Globex Inc South
3 100 150 North 100 Acme Corp North
4 102 400 West 102 Initech West
5 103 200 East 103 Stark Industries East
# Custom table name for stdin data
$ cat examples/employees.csv | pirkle stdin:workers --query "from workers | sort {-salary}"
id name department age salary country
-------------------------------------------------------------
3 Robert Johnson Engineering 41 92000 USA
1 John Smith Engineering 32 85000 USA
9 James Brown Sales 39 85000 USA
8 Sarah Kim Engineering 31 83000 South Korea
10 Fatima Al-Farsi Marketing 36 76000 UAE
5 Ahmed Hassan Engineering 29 75000 Egypt
4 Lisa Wang Marketing 35 70000 China
7 Carlos Rodriguez Marketing 33 68000 Spain
2 Maria Garcia Sales 28 65000 Mexico
6 Emma Wilson Sales 27 62000 UK
Pirkle integrates seamlessly with Unix pipelines:
# From curl/API responses
curl -s api.example.com/data.csv | pirkle stdin --query "from stdin | filter active == true"
# From other command output
cat *.csv | pirkle stdin --query "from stdin | group category (aggregate {count = count this})"
# Complex pipeline
grep "ERROR" logs.csv | pirkle stdin --query "
from stdin
| derive hour = (timestamp | date.truncate hour)
| group hour (aggregate {error_count = count this})
| sort hour"
stdin to read from stdinstdin:tablename for custom table names--query or --, Pirkle will read the query from stdin:
$ echo "from employees | filter country == 'USA'" | pirkle examples/employees.csv
$ cat examples/employees.csv | pirkle stdin:workers stdin:staff --query "from workers | join staff (==id)"
Pirkle intelligently determines how to use stdin based on your command arguments, making it a flexible tool for data pipelines.
To see the structure of your tables:
# View schemas with the --schema flag
$ pirkle examples/employees.csv --schema
Table: employees
Columns:
id (INTEGER)
name (TEXT)
department (TEXT)
age (INTEGER)
salary (INTEGER)
country (TEXT)
You can use the --show-sql flag to see the SQL that would be generated without executing the query:
$ pirkle examples/employees.csv --query "from employees | filter country == 'USA'" --show-sql
SELECT
*
FROM
employees
WHERE
country = 'USA'
-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)
This also works with PRQL files:
$ pirkle examples/employees.csv --query examples/queries/avg_age_by_department.prql --show-sql
SELECT
department_id,
AVG(age) AS avg_age
FROM
employees
GROUP BY
department_id
-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)
| Format | Use Case | Example |
|---|---|---|
table |
Human-readable terminal output | Data exploration |
csv |
Spreadsheet import, further processing | pirkle data.csv --format csv > result.csv |
jsonl |
API integration, log analysis | pirkle logs.csv --format jsonl | jq '.' |
logfmt |
Structured logging, monitoring | Integration with log aggregators |
Default is a readable table format.
To output CSV:
$ pirkle examples/employees.csv --format csv --query "from employees | filter salary > 70000"
1,John Smith,Engineering,32,85000,USA
3,Robert Johnson,Engineering,41,92000,USA
5,Ahmed Hassan,Engineering,29,75000,Egypt
8,Sarah Kim,Engineering,31,83000,South Korea
9,James Brown,Sales,39,85000,USA
10,Fatima Al-Farsi,Marketing,36,76000,UAE
Other supported formats:
# JSON Lines format
$ pirkle examples/employees.csv --format jsonl --query "from employees | filter country == 'USA'"
{"age":32,"country":"USA","department":"Engineering","id":1,"name":"John Smith","salary":85000}
{"age":41,"country":"USA","department":"Engineering","id":3,"name":"Robert Johnson","salary":92000}
{"age":39,"country":"USA","department":"Sales","id":9,"name":"James Brown","salary":85000}
# logfmt format
$ pirkle examples/employees.csv --format logfmt --query "from employees | filter country == 'USA'"
id="1" name="John Smith" department="Engineering" age="32" salary="85000" country="USA"
id="3" name="Robert Johnson" department="Engineering" age="41" salary="92000" country="USA"
id="9" name="James Brown" department="Sales" age="39" salary="85000" country="USA"
You can use prewritten PRQL query files:
# Use a PRQL file directly with --query
$ pirkle examples/employees.csv --query examples/queries/top_5_paid.prql
name department salary
---------------------------------------
Robert Johnson Engineering 92000
John Smith Engineering 85000
James Brown Sales 85000
Sarah Kim Engineering 83000
Fatima Al-Farsi Marketing 76000
To join tables, use the join operation:
$ pirkle examples/orders.csv examples/customers.csv --query "from orders
join customers (==customer_id)
select {orders.order_id, customers.name, orders.amount}"
order_id name amount
-------------------------------
1 Acme Corp 250
2 Globex Inc 300
3 Acme Corp 150
4 Initech 400
5 Stark Industries 200
# Find average salary and employee count by department
pirkle employees.csv --query "
from employees
| group department (aggregate {
avg_salary = average salary,
count = count this
})
| sort -avg_salary
| take 5"
# Remove duplicates and filter valid records
pirkle messy_data.csv --query "
from messy_data
| filter email != null
| group email (take 1)
| select {name, email, phone}"
# Combine sales data with customer information
pirkle sales.csv customers.csv --query "
from sales
| join customers (==customer_id)
| group customers.region (aggregate {total_sales = sum sales.amount})
| sort -total_sales"
# Analyze daily sales trends
pirkle transactions.csv --query "
from transactions
| derive date = (timestamp | date.truncate day)
| group date (aggregate {
daily_sales = sum amount,
transaction_count = count this
})
| sort date"
# Quick summary statistics
pirkle dataset.csv --query "
from dataset
| aggregate {
min_value = min price,
max_value = max price,
avg_value = average price,
total_records = count this
}"
File not found errors
# Ensure file paths are correct
pirkle ./data/employees.csv --schema
Query syntax errors
# Use --show-sql to debug generated SQL
pirkle data.csv --query "your query here" --show-sql
Large file performance
# For very large files, consider filtering early in the pipeline
pirkle large_file.csv --query "from large_file | filter date > @2024-01-01 | ..."
Memory issues with large datasets
# Process data in chunks or use more specific filters
pirkle huge_file.csv --query "from huge_file | filter region == 'US' | take 1000"
Included example files:
examples/employees.csv: Employee data with department, salary, and country informationexamples/departments.csv: Department names and IDsexamples/customers.csv, examples/orders.csv: Customer-order relationship dataexamples/queries/*.prql: Sample PRQL queriesMIT