| Crates.io | llm-bucket |
| lib.rs | llm-bucket |
| version | 0.2.0 |
| created_at | 2025-07-12 06:12:36.205942+00 |
| updated_at | 2025-07-12 18:58:21.105983+00 |
| description | Open source core logic and pipelines for synchronising a bucket with content for LLMs to consume with RAG. |
| homepage | |
| repository | https://github.com/kasbuunk/llm-bucket |
| max_upload_size | |
| id | 1749017 |
| size | 201,961 |
A fast, structured CLI utility for aggregating knowledge snapshots from Git repositories (and soon other sources) into ready-to-ingest local outputs and/or uploading them to an API for knowledge base workflows, LLM training, and auditing. Designed for automation, repeatability, and clean output.
git clone https://github.com/kasbuunk/llm-bucket.git
cd llm-bucket
cargo build --release
The executable will be at ./target/release/llm-bucket.
All actions are configured in a YAML file. No command-line flags for input sources.
config.yaml):download:
output_dir: ./output
sources:
- type: git
repo_url: "https://github.com/youruser/yourrepo.git"
reference: main # optional: branch/tag/commit
process:
kind: FlattenFiles # or ReadmeToPDF
output_dir: Root directory for clones & processed data (recommended: gitignore this in production).sources: List of source blocks. Currently only type: git is supported.
repo_url: HTTPS or SSH URL for the git repo.reference: Optional; branch/tag/commit (default: main).process.kind: Currently accepts:
FlattenFiles: Flatten all files for upload.ReadmeToPDF: Convert repository README.md to PDF (if implemented for your repo).After configuring config.yaml, run:
./target/release/llm-bucket sync --config config.yaml
output_dir (subdirectories per repo, deterministic naming).Note: Only subcommand available is sync (see below).
Uploading to a remote knowledge base/API requires these environment variables:
BUCKET_ID — Integer bucket/project ID (provided by backend/admin)OCP_APIM_SUBSCRIPTION_KEY — API key/token for uploadYou can use a .env file (auto-loaded by the CLI) or set variables in your environment:
export BUCKET_ID=1234
export OCP_APIM_SUBSCRIPTION_KEY=your-token-here
output_dir.
process.kind../output/
git_github_com_youruser_yourrepo_git_main/
src/
README.md
...
Run all checks and tests:
cargo test
Tests cover:
Q: How do I add a new source type?
A: See the src/ directory for modular structure; implement new SourceAction and expand the YAML loader, then add download, processing, and (optional) upload logic.
Q: Is interactive usage supported?
A: No—llm-bucket is for declarative, repeatable workflows. Only config files.
Q: Is output safe for public commit?
A: No. Output is meant for ingestion/upload, not VCS; it should be gitignored.
MIT (see LICENSE).
For design notes and future directions, see notes.md.