batuta

Crates.iobatuta
lib.rsbatuta
version0.5.0
created_at2025-11-21 08:04:54.931227+00
updated_at2026-01-14 14:31:36.661029+00
descriptionOrchestration framework for converting ANY project (Python, C/C++, Shell) to modern Rust
homepage
repositoryhttps://github.com/paiml/Batuta
max_upload_size
id1943226
size4,464,908
Noah Gift (noahgift)

documentation

README

batuta

batuta

Orchestration framework for the Sovereign AI Stack — privacy-preserving ML infrastructure in pure Rust

CI Crates.io Documentation Book License


Table of Contents

Overview

Batuta coordinates the Sovereign AI Stack, a comprehensive pure-Rust ecosystem for organizations requiring complete control over their ML infrastructure. The stack enables privacy-preserving inference, model management, and data processing without external cloud dependencies.

Key Capabilities

  • Privacy Tiers: Sovereign (local-only), Private (VPC), Standard (cloud-enabled)
  • Model Security: Ed25519 signatures, ChaCha20-Poly1305 encryption, BLAKE3 content addressing
  • API Compatibility: OpenAI-compatible endpoints for drop-in replacement
  • Observability: Prometheus metrics, distributed tracing, A/B testing
  • Cost Control: Circuit breakers with configurable daily budgets

Installation

cargo install batuta

Or add to your Cargo.toml:

[dependencies]
batuta = "0.4"

Quick Start

# Analyze project structure and dependencies
batuta analyze --languages --dependencies --tdg

# Query the Sovereign AI Stack
batuta oracle "How do I serve a Llama model locally?"

# Model registry operations
batuta pacha pull llama3-8b-q4
batuta pacha sign model.gguf --identity alice@example.com
batuta pacha verify model.gguf

# Encrypt models for distribution
batuta pacha encrypt model.gguf --password-env MODEL_KEY
batuta pacha decrypt model.gguf.enc --password-env MODEL_KEY

Demo

asciicast

Live Demo: paiml.github.io/batuta | API Docs

Example Output (batuta analyze --tdg):

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  📊 Technical Debt Gradient Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Project: my-project
  Language: Rust (confidence: 98%)

  Metrics:
    Cyclomatic Complexity:  4.2 avg (good)
    Test Coverage:          87% (A-)
    Documentation:          92% (A)
    Dependency Health:      95% (A+)

  TDG Score: 91.5/100 (A)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Stack Components

Batuta orchestrates a layered architecture of pure-Rust components:

┌─────────────────────────────────────────────────────────────┐
│                    batuta v0.4.8                            │
│                 (Orchestration Layer)                       │
├─────────────────────────────────────────────────────────────┤
│     realizar v0.5        │         pacha v0.2               │
│   (Inference Engine)     │      (Model Registry)            │
├──────────────────────────┴──────────────────────────────────┤
│   aprender v0.24   │  entrenar v0.5  │  alimentar v0.2      │
│    (ML Algorithms) │    (Training)   │   (Data Loading)     │
├─────────────────────────────────────────────────────────────┤
│   trueno v0.11     │  repartir v2.0  │   renacer v0.9       │
│ (SIMD/GPU Compute) │  (Distributed)  │  (Syscall Tracing)   │
└─────────────────────────────────────────────────────────────┘

Core Components

Component Version Description
trueno 0.11 SIMD/GPU compute primitives (AVX2/AVX-512/NEON, wgpu)
aprender 0.24 ML algorithms: regression, trees, clustering, NAS
entrenar 0.5 Training: autograd, LoRA/QLoRA, quantization
realizar 0.5 Inference engine for GGUF/SafeTensors models
pacha 0.2 Model registry with signatures, encryption, lineage
repartir 2.0 Distributed compute (CPU/GPU/Remote executors)
renacer 0.9 Syscall tracing with semantic validation
batuta 0.4 Stack orchestration, drift detection, CLI

Extended Ecosystem

Component Version Description
trueno-db 0.3 GPU-accelerated analytics database
trueno-graph 0.1 Graph database for code analysis
trueno-rag 0.1 RAG pipeline (chunking, BM25+vector, RRF)
trueno-viz 0.1 Terminal/PNG visualization
alimentar 0.2 Zero-copy Parquet/Arrow data loading
whisper-apr 0.1 Pure Rust Whisper ASR (WASM-first)
jugar 0.1 Game engine (ECS, physics, AI, WASM)
simular 0.3 Simulation engine (Monte Carlo, physics)
bashrs 6.53 Shell-to-Rust transpiler and linter
presentar 0.3 Terminal presentation framework
pmat 2.213 Project quality analysis toolkit

Commands

batuta analyze

Analyze project structure, languages, and dependencies:

batuta analyze --languages --dependencies --tdg

# Output:
# Primary language: Python
# Dependencies: pip (42 packages), ML frameworks detected
# TDG Score: 73.2/100 (B)
# Recommended: Use Aprender for ML, Realizar for inference

batuta oracle

Query the stack for component recommendations:

# Natural language queries
batuta oracle "Train random forest on 1M samples"

# List all components
batuta oracle --list

# Component details
batuta oracle --show realizar

# Interactive mode
batuta oracle --interactive

batuta pacha

Model registry operations:

# Pull models from registry
batuta pacha pull llama3-8b-q4

# Generate signing keys
batuta pacha keygen --identity alice@example.com

# Sign models for distribution
batuta pacha sign model.gguf --identity alice@example.com

# Verify model signatures
batuta pacha verify model.gguf

# Encrypt models at rest
batuta pacha encrypt model.gguf --password-env MODEL_KEY

# Decrypt for inference
batuta pacha decrypt model.gguf.enc --password-env MODEL_KEY

batuta content

Generate structured content with quality constraints:

# Available content types
batuta content types

# Generate book chapter prompt
batuta content emit --type bch --title "Error Handling" --audience "developers"

# Validate content quality
batuta content validate --type bch chapter.md

batuta stack

Manage the Sovereign AI Stack ecosystem:

# Check stack component versions
batuta stack versions

# Detect version drift across published crates
batuta stack drift

# Generate fix commands for drift issues
batuta stack drift --fix --workspace ~/src

# Check which crates need publishing
batuta stack publish-status

# Quality gate for CI/pre-commit
batuta stack gate

Automatic Drift Detection: Batuta blocks all commands if published stack crates are using outdated versions of other stack crates. Use --unsafe-skip-drift-check to bypass in emergencies.

Privacy Tiers

The stack enforces data sovereignty through configurable privacy tiers:

Tier Behavior Use Case
Sovereign Blocks ALL external API calls Healthcare, Government
Private VPC/dedicated endpoints only Financial services
Standard Public APIs allowed General deployment
use batuta::serve::{BackendSelector, PrivacyTier};

let selector = BackendSelector::new()
    .with_privacy(PrivacyTier::Sovereign);

// Returns only local backends: Realizar, Ollama, LlamaCpp
let backends = selector.recommend();

Model Security

Digital Signatures (Ed25519)

Verify model integrity before loading:

use pacha::signing::{SigningKey, sign_model, verify_model};

let signing_key = SigningKey::generate();
let signature = sign_model(&model_data, &signing_key)?;

// Verification fails if model tampered
verify_model(&model_data, &signature)?;

Encryption at Rest (ChaCha20-Poly1305)

Protect models during distribution:

use pacha::crypto::{encrypt_model, decrypt_model};

let encrypted = encrypt_model(&model_data, "password")?;
let decrypted = decrypt_model(&encrypted, "password")?;

Documentation

Design Principles

Batuta applies Toyota Production System principles:

Principle Application
Jidoka Automatic failover with context preservation
Poka-Yoke Privacy tiers prevent data leakage
Heijunka Spillover routing for load leveling
Muda Cost circuit breakers prevent waste
Kaizen Continuous metrics and optimization

Development

# Clone repository
git clone https://github.com/paiml/batuta.git
cd batuta

# Build
cargo build --release

# Run tests
cargo test

# Build documentation
mdbook build book

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository and create your branch from main
  2. Run tests before submitting: cargo test --all-features
  3. Run lints: cargo clippy --all-targets --all-features -- -D warnings
  4. Format code: cargo fmt --all
  5. Update documentation for any API changes
  6. Submit a pull request with a clear description

See our CI workflow for the full test suite.

License

MIT License — see LICENSE for details.

Links


Batuta — Orchestrating sovereign AI infrastructure.

Commit count: 223

cargo fmt