| Crates.io | avocado-server |
| lib.rs | avocado-server |
| version | 2.2.0 |
| created_at | 2025-12-09 16:02:25.756309+00 |
| updated_at | 2025-12-10 17:23:29.87812+00 |
| description | HTTP server for AvocadoDB - deterministic context compilation for AI agents |
| homepage | https://avocadodb.ai |
| repository | https://github.com/avocadodb/avocadodb |
| max_upload_size | |
| id | 1975684 |
| size | 188,457 |
The first deterministic context database for AI agents
Fix your RAG in 5 minutes - same query, same context, every time.
AvocadoDB is a span-based context compiler that replaces traditional vector databases' chaotic "top-k" retrieval with deterministic, citation-backed context generation.
Pure Rust embeddings = 6x faster than OpenAI, works completely offline, costs $0.
Current RAG systems are fundamentally broken:
# Run benchmarks on your hardware
./target/release/avocado benchmark
# Results (M1 Mac example):
# Single embedding: 1.2ms (vs ~250ms OpenAI)
# Batch of 100: 8.7ms (vs ~250ms OpenAI)
# Full compilation: 43ms (vs ~300ms OpenAI)
#
# Speedup: 6-7x faster ⚡
# Cost: $0 (vs ~$0.0001 per 1K tokens)
See EMBEDDING_PERFORMANCE.md for detailed benchmarks.
cargo install avocado-cli
That's it! Now you can use avocado directly:
avocado --version
avocado init
avocado ingest ./docs --recursive
avocado compile "your query"
Run the server with Docker:
# Run with Docker
docker run -d \
-p 8765:8765 \
-v avocado-data:/data \
--name avocadodb \
avocadodb/avocadodb:latest
# Or use Docker Compose
docker-compose up -d
# Test the server
curl http://localhost:8765/health
See Docker Guide for complete documentation.
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Clone and build
git clone https://github.com/avocadodb/avocadodb
cd avocadodb
cargo build --release
# Optional: Set OpenAI API key (only if you want to use OpenAI embeddings)
# By default, AvocadoDB uses local embeddings (no API key required, no Python required!)
#
# Local embeddings strategy (automatic, in priority order):
# 1. Pure Rust with fastembed (semantic, good quality, no Python required) ✅ DEFAULT
# - Uses all-MiniLM-L6-v2 model (384 dimensions) by default
# - ONNX-based, fast and efficient
# - Model downloaded automatically on first use (~90MB)
# - To increase dimensionality, set AVOCADODB_EMBEDDING_MODEL:
# * "nomic" or "nomicv15" → 768 dimensions (good balance)
# * "bgelarge" or "bge-large-en-v1.5" → 1024 dimensions (higher quality)
# 2. Python + sentence-transformers (fallback if fastembed unavailable)
# - Requires: pip install sentence-transformers
# 3. Hash-based fallback (deterministic, but NOT semantic)
# - Works always, but poor semantic quality
#
# To use OpenAI embeddings instead:
# export OPENAI_API_KEY="sk-..."
# export AVOCADODB_EMBEDDING_PROVIDER=openai
# Initialize database
./target/release/avocado init
# Get model recommendation (optional)
./target/release/avocado recommend --corpus-size 5000 --use-case production
# Recommends optimal embedding model for your use case
# Ingest documents
./target/release/avocado ingest ./docs --recursive
# Output: Ingested 42 files → 387 spans
# Compile context (uses daemon at http://localhost:8765 by default)
./target/release/avocado compile "How does authentication work?" --budget 8000
# Force local mode (uses .avocado/db.sqlite in current project)
./target/release/avocado compile "How does authentication work?" --local --budget 8000
# Run performance benchmarks
./target/release/avocado benchmark
# Shows real performance on your hardware
# Start the daemon with remote GPU embeddings (Modal)
avocado serve --gpu --embed-url https://<your-modal-endpoint>/embed
# or CPU/local (default)
avocado serve
Example Output:
Compiling context for: "How does authentication work?"
Token budget: 8000
[1] docs/authentication.md
Lines 1-23
# Authentication System
Our authentication uses JWT tokens with secure refresh mechanisms...
---
[2] src/middleware/auth.ts
Lines 45-78
export function authenticateRequest(req: Request) {
const token = req.headers.authorization?.split(' ')[1];
if (!token) throw new UnauthorizedError();
...
}
---
Compiled 12 spans using 7,891 tokens (98.6% utilization)
Compilation time: 243ms
Context hash: e3b0c4429...52b855 (deterministic ✓)
cd sdks/python
pip install -e .
from avocado import AvocadoDB
db = AvocadoDB()
db.ingest("./docs", recursive=True)
result = db.compile("my query", budget=8000)
print(result.text) # Deterministic every time
cd sdks/typescript
npm install
npm run build
import { AvocadoDB } from 'avocadodb';
const db = new AvocadoDB();
await db.ingest('./docs', recursive: true);
const result = await db.compile('my query', { budget: 8000 });
console.log(result.text); // Deterministic every time
# Start server (binds to 127.0.0.1 by default)
./target/release/avocado-server
# Use the API
curl -X POST http://localhost:8765/compile \
-H "Content-Type: application/json" \
-d '{"query": "authentication", "token_budget": 8000, "project": "'"$PWD"'"}'
AvocadoDB is production-ready with full Docker and Kubernetes support.
Two Docker Images Available:
| Image | Contents | Use Case |
|---|---|---|
avocadodb/avocadodb:latest |
Rust server + SQLite | Standalone, zero-config |
avocadodb/postgres:pg16 |
PostgreSQL + pgvector + avocado extension | Production, native SQL |
# Quick start with Docker (standalone)
docker run -d -p 8765:8765 -v avocado-data:/data avocadodb/avocadodb:latest
# Or use Docker Compose
docker-compose up -d
# PostgreSQL Extension (native SQL)
docker compose --profile pgext up -d postgres-avocado
# Then connect: psql postgres://avocado:changeme@localhost:5432/avocadodb
Features:
See Docker Guide for complete documentation.
AvocadoDB is available as a native PostgreSQL extension, enabling deterministic context compilation directly in SQL:
-- Enable extensions
CREATE EXTENSION vector;
CREATE EXTENSION avocado;
-- Configure embedding provider (optional - defaults to fastembed)
SELECT avocado_set_embedding_provider('ollama');
SELECT avocado_set_ollama_config('http://localhost:11434', 'bge-m3');
-- Ingest documents
SELECT avocado_ingest_artifact('docs/auth.md', 'Authentication uses JWT tokens...');
-- Compile context directly in SQL
SELECT avocado_compile('How does authentication work?', '{"token_budget": 4000}'::jsonb);
-- Session management
SELECT avocado_create_session('user@example.com', 'Support Chat');
SELECT avocado_add_message('session-id', 'user', 'How do I login?', NULL);
SELECT avocado_get_conversation_history('session-id', 8000);
-- Multi-agent orchestration
SELECT avocado_register_agent('moderator', 'Tech Moderator', 'gpt-4', 'You are a moderator...');
SELECT avocado_get_agent_relations('session-id');
-- Check stats
SELECT avocado_stats();
Features:
# Deploy to Kubernetes
kubectl apply -k k8s/
# Verify deployment
kubectl get pods -l app=avocadodb
Includes:
See Kubernetes Guide for complete documentation.
| Variable | Default | Description |
|---|---|---|
PORT |
8765 |
HTTP server port |
BIND_ADDR |
127.0.0.1 |
Bind address (set 0.0.0.0 to expose publicly) |
RUST_LOG |
info |
Log level |
AVOCADODB_EMBEDDING_PROVIDER |
local |
Provider: local, ollama, or openai |
AVOCADODB_EMBEDDING_MODEL |
minilm |
Fastembed model (minilm, nomic, bgelarge) |
AVOCADODB_OLLAMA_URL |
http://localhost:11434 |
Ollama server URL |
AVOCADODB_OLLAMA_MODEL |
bge-m3 |
Ollama model name |
OPENAI_API_KEY |
- | OpenAI API key (if using OpenAI) |
AVOCADODB_ROOT |
unset | Optional project root. When set, all project paths must be under this directory. Requests outside are rejected. |
API_TOKEN |
unset | If set, requires header X-Avocado-Token to be present and equal for all routes (except /health, /api-docs/*). |
MAX_BODY_BYTES |
2097152 (2MB) |
Request body size limit to protect against large payloads. |
Security note:
BIND_ADDR=0.0.0.0 and front it with auth.project (their current working directory), and the server normalizes paths and can restrict to AVOCADODB_ROOT.Query → Embed → [Semantic Search + Lexical Search] → Hybrid Fusion
→ MMR Diversification → Token Packing → Deterministic Sort → WorkingSet
(artifact_id, start_line) for reproducibilityNEW in v2.1: Enhanced determinism, explainability, and quality tracking features based on production feedback.
Every compilation now includes a version manifest for full reproducibility:
// Access manifest from WorkingSet
let manifest = working_set.manifest.unwrap();
println!("Avocado version: {}", manifest.avocado_version);
println!("Embedding model: {}", manifest.embedding_model);
println!("Context hash: {}", manifest.context_hash);
The manifest includes: avocado version, tokenizer, embedding model, embedding dimensions, chunking params, index params, and a SHA256 context hash.
Understand exactly how context was selected with explain mode:
# CLI with explain
avocado compile "authentication" --explain
# Shows candidates at each pipeline stage:
# - Semantic search (top 50 from HNSW)
# - Lexical search (keyword matches)
# - Hybrid fusion (RRF combination)
# - MMR diversification
# - Token packing
# - Final deterministic order
# Python SDK
result = db.compile("auth", budget=8000, explain=True)
if result.explain:
print(f"Semantic candidates: {len(result.explain.semantic_candidates)}")
print(f"Final spans: {len(result.explain.final_order)}")
Compare retrieval results across corpus versions for auditing:
use avocado_core::{diff_working_sets, summarize_diff};
let diff = diff_working_sets(&before, &after);
println!("{}", summarize_diff(&diff));
// Output: "3 added, 1 removed, 2 reranked"
Only re-embed changed files - unchanged content is automatically skipped:
# First ingest
avocado ingest ./docs --recursive
# Ingested 42 files → 387 spans
# Re-ingest after editing 3 files
avocado ingest ./docs --recursive
# Skipped 39 unchanged, Updated 3 files → 28 spans
Content-hash comparison ensures minimal re-embedding while keeping the index fresh.
Built-in support for golden set testing and quality metrics:
use avocado_core::{GoldenQuery, evaluate};
let queries = vec![
GoldenQuery {
query: "authentication".to_string(),
expected_paths: vec!["docs/auth.md".to_string()],
k: 10,
},
];
let summary = evaluate(&queries, &db, &index, &config).await?;
println!("Recall@10: {:.2}%", summary.mean_recall * 100.0);
println!("MRR: {:.3}", summary.mean_mrr);
NEW in v2.0: Multi-turn conversation tracking with context compilation
AvocadoDB now supports session management, enabling AI agents to maintain conversation history and context across multiple interactions.
from avocado import AvocadoDB
db = AvocadoDB(mode="http")
# Create a session
session = db.create_session(user_id="alice", title="Project Q&A")
# Multi-turn conversation
result = session.compile("What is AvocadoDB?", budget=8000)
session.add_message("assistant", "AvocadoDB is a deterministic context database...")
result2 = session.compile("How does the compiler work?")
session.add_message("assistant", "The compiler uses hybrid search...")
# Get conversation history
history = session.get_history()
# Replay for debugging
replay = session.replay()
Session classSee SESSION_MANAGEMENT.md for complete documentation.
When RAG systems return different context for the same query:
AvocadoDB fixes this with deterministic compilation - same query, same context, every time.
# Run the same query multiple times
avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
# Same hash every single time! ✅
Phase 1 achieves production-ready performance:
| Metric | Target | Actual | Status |
|---|---|---|---|
| Compilation time (8K tokens) | < 500ms | ~50ms avg | ✅ 10x faster |
| Token budget utilization | > 95% | 90-95% | ✅ Excellent |
| Determinism | 100% | 100% | ✅ Perfect |
| Duplicate spans | 0 | 0 | ✅ Perfect |
Breakdown for 8K token budget compilation (with Pure Rust embeddings):
Embed query: 1-5ms (2-5% of total) - Pure Rust (fastembed), local
Semantic search: <1ms (Vector similarity, HNSW)
Lexical search: <1ms (SQL LIKE query)
Hybrid fusion: <1ms (RRF score combination)
MMR diversification: 5-10ms (Diversity selection)
Token packing: <1ms (Greedy budget allocation)
Deterministic sort: <1ms (Stable sort)
Build context: <1ms (Text concatenation)
Count tokens: 30-40ms (tiktoken encoding)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOTAL: 40-60ms (6x faster than OpenAI!)
Performance Comparison:
| Metric | Pure Rust (fastembed) | OpenAI API |
|---|---|---|
| Query Embedding | 1-5ms | 200-300ms |
| Total Compilation | 40-60ms | 240-360ms |
| Throughput | 200-1000 texts/sec | 3-5 batches/sec |
| Cost | Free | ~$0.0001/1K tokens |
| Rate Limits | None | Varies by tier |
| Offline | ✅ Yes | ❌ No |
| Quality | Good (384 dims) | Excellent (1536 dims) |
Pure Rust embeddings are 6x faster and completely free! Optimization: All algorithms run in <15ms total (highly optimized)
See docs/performance.md for detailed analysis and scaling characteristics.
avocado initInitialize a new AvocadoDB database:
avocado init [--path <db-path>]
Creates .avocado/ directory with SQLite database and vector index.
avocado ingestIngest documents into the database:
avocado ingest <path> [--recursive]
Examples:
# Ingest single file
avocado ingest README.md
# Ingest entire directory recursively
avocado ingest docs/ --recursive
# Ingest specific file types
avocado ingest src/ --recursive --include "*.rs,*.md,*.toml"
The ingestion process:
avocado compileCompile a deterministic context for a query:
avocado compile <query> [OPTIONS]
Options:
--budget <tokens>: Token budget (default: 8000)--json: Output as JSON instead of human-readable format--explain: Show explain plan with candidates at each pipeline stage--mmr-lambda <0.0-1.0>: MMR diversity parameter (default: 0.5)
--semantic-weight <float>: Semantic search weight (default: 0.7)--lexical-weight <float>: Lexical search weight (default: 0.3)Examples:
# Basic compilation
avocado compile "How does authentication work?"
# Large context window
avocado compile "error handling patterns" --budget 16000
# Prioritize diversity over relevance
avocado compile "testing strategies" --mmr-lambda 0.3
# Tune search weights (more keyword matching)
avocado compile "API endpoints" --semantic-weight 0.5 --lexical-weight 0.5
# JSON output for programmatic use
avocado compile "authentication" --budget 8000 --json
JSON Output Format:
{
"text": "[1] docs/auth.md\nLines 1-23\n\n# Authentication...",
"spans": [
{
"id": "uuid",
"artifact_id": "uuid",
"start_line": 1,
"end_line": 23,
"text": "# Authentication...",
"embedding": [0.002, 0.013, ...],
"embedding_model": "text-embedding-ada-002",
"token_count": 127,
"metadata": null
}
],
"citations": [
{
"span_id": "uuid",
"artifact_id": "uuid",
"artifact_path": "docs/auth.md",
"start_line": 1,
"end_line": 23,
"score": 0.0
}
],
"tokens_used": 2232,
"query": "authentication",
"compilation_time_ms": 243
}
avocado statsShow database statistics:
avocado stats
Example output:
Database Statistics:
Artifacts: 42
Spans: 387
Total Tokens: 125,431
Average Tokens/Span: 324
avocado clearClear all data from the database:
avocado clear
Warning: This permanently deletes all ingested documents and embeddings!
Use AvocadoDB as a library in your Rust projects:
[dependencies]
avocado-core = "2.2"
tokio = { version = "1.35", features = ["full"] }
use avocado_core::{Database, VectorIndex, compiler, types::CompilerConfig};
#[tokio::main]
async fn main() -> avocado_core::types::Result<()> {
// Open database
let db = Database::new(".avocado/db.sqlite")?;
// Load vector index from database
let index = VectorIndex::from_database(&db)?;
// Configure compilation
let config = CompilerConfig {
token_budget: 8000,
semantic_weight: 0.7,
lexical_weight: 0.3,
mmr_lambda: 0.5,
enable_mmr: true,
};
// Compile context
let working_set = compiler::compile(
"How does authentication work?",
config,
&db,
&index,
Some("your-openai-api-key")
).await?;
println!("Compiled {} spans using {} tokens",
working_set.spans.len(),
working_set.tokens_used
);
println!("Deterministic hash: {}", working_set.deterministic_hash());
// Use working_set.text in your LLM prompt
println!("Context:\n{}", working_set.text);
Ok(())
}
avocadodb/
├── avocado-core/ # Core engine (Rust library)
├── avocado-cli/ # Command-line tool
├── avocado-server/ # HTTP server (REST API)
├── avocado-pgext/ # PostgreSQL extension (pgrx)
├── python/ # Python SDK
├── migrations/ # Database schema
├── tests/ # Integration tests
└── docs/ # Documentation
# Unit tests
cargo test
# Integration tests (requires OPENAI_API_KEY)
cargo test --test determinism -- --ignored
cargo test --test performance -- --ignored
cargo test --test correctness -- --ignored
# Development build
cargo build
# Release build
cargo build --release
# Run CLI
cargo run --bin avocado -- --help
# Run server
cargo run --bin avocado-server
We welcome contributions! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
AvocadoDB includes comprehensive test suites to validate determinism and performance:
# Run all tests and generate report
./scripts/run-tests.sh
# Run determinism validation only (100 iterations)
./scripts/test-determinism.sh
# Run performance benchmarks
./scripts/benchmark.sh
See docs/testing.md for complete testing documentation.
Built by the AvocadoDB Team | Making retrieval deterministic, one context at a time.