| Crates.io | specds |
| lib.rs | specds |
| version | 0.1.0 |
| created_at | 2025-07-21 06:26:38.982637+00 |
| updated_at | 2025-07-21 06:26:38.982637+00 |
| description | A spec-driven data science pipeline generator using LLMs |
| homepage | |
| repository | https://github.com/renbytes/specds |
| max_upload_size | |
| id | 1761789 |
| size | 9,681,719 |
specds)A high-performance, enterprise-grade CLI tool written in Rust that generates complete, tested data science analysis pipelines from user specifications using LLMs.
This tool streamlines the process of creating boilerplate data analysis code. Instead of writing scripts from scratch, a user provides a high-level specification, and the tool generates the corresponding code in Python or SQL, complete with functions, tests, and best practices built-in.
It is inspired by this talk on spec-driven development by Sean Grove of OpenAI.
init command gets you started in seconds.curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)git clone git@github.com:renbytes/specds.git
cd specds
Copy the example .env file and add your OpenAI API key.
cp .env.example .env
# Edit .env and add your key:
# OPENAI_API_KEY="sk-..."
# GEMINI_API_KEY="AI..."
Compile the project in release mode for optimal performance.
cargo build --release
Make the command globally available:
sudo cp ./target/release/specds /usr/local/bin/
There are two primary ways to use this tool: the simple File-Based Workflow (recommended for getting started) and the powerful Flag-Based Workflow (ideal for automation).
This is the easiest way to get started.
Step 1: Initialize a spec file
specds init
This creates a spec.toml file with helpful comments and examples.
Step 2: Edit spec.toml
Open the newly created spec.toml file and fill in your analysis details:
# spec.toml
language = "Python"
analysis_type = "Simple Aggregation"
description = "A weekly report on new user signups."
[[dataset]]
name = "user_events"
description = "Primary input dataset."
sample_data_path = "path/to/your/sample_data.csv"
[[metric]]
name = "new_signups"
logic = "Users where event_type is 'signup' and is_new_user is true"
aggregation = "CountDistinct"
aggregation_field = "user_id"
Step 3: Generate the code
specds generate --spec spec.toml --provider gemini --model gemini-2.5-pro
Note: you need to pick a LLM provider and model. Currently OpenAI and Gemini (Google) are supported.
This method is ideal for scripting and automation:
specds generate \
--language python \
--description "A weekly report on new user signups." \
--analysis-type "Simple Aggregation" \
--dataset-name "user_events" \
--sample-data-path ./sample_data.csv \
--metric-name "new_signups" \
--metric-logic "Users where event_type is 'signup' and is_new_user is true" \
--aggregation count-distinct \
--aggregation-field "user_id"
Either workflow creates a new directory inside generated_jobs/ with a timestamp, containing your complete, tested analysis pipeline:
generated_jobs/
└── python/
└── simple-aggregation/
└── 20250720-193000__a-weekly-report-on-new-user-signups/
├── job.py
├── functions.py
├── tests/
│ ├── test_job.py
│ └── test_functions.py
└── README.md
Explore real-world use cases in the examples/ directory:
| Language | Framework | Use Case |
|---|---|---|
| Python | pandas | Data analysis, reporting |
| PySpark | Spark | Big data, distributed computing |
| SQL | dbt-style | Data warehousing, analytics |
To ensure code quality, run the following commands:
cargo fmtcargo clippy -- -D warningscargo testgit checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)This project is licensed under the MIT OR Apache-2.0 license.
examples/ directory for detailed use cases