avocado-server

Crates.io	avocado-server
lib.rs	avocado-server
version	2.2.0
created_at	2025-12-09 16:02:25.756309+00
updated_at	2025-12-10 17:23:29.87812+00
description	HTTP server for AvocadoDB - deterministic context compilation for AI agents
homepage	https://avocadodb.ai
repository	https://github.com/avocadodb/avocadodb
max_upload_size
id	1975684
size	188,457

Ish (eshbtc)

documentation

README

AvocadoDB

The first deterministic context database for AI agents

Fix your RAG in 5 minutes - same query, same context, every time.

What is AvocadoDB?

AvocadoDB is a span-based context compiler that replaces traditional vector databases' chaotic "top-k" retrieval with deterministic, citation-backed context generation.

Pure Rust embeddings = 6x faster than OpenAI, works completely offline, costs $0.

The Problem with RAG

Current RAG systems are fundamentally broken:

❌ Same query → different results each time (non-deterministic)
❌ Token budgets wasted on duplicates (60-70% utilization)
❌ No citations or verifiability
❌ Hallucinations from inconsistent context
❌ Slow (200-300ms just for OpenAI embedding calls)
❌ Expensive (API costs scale with usage)

The AvocadoDB Solution

✅ 100% Deterministic: Same query → same context, every time
✅ 6x Faster: 40-60ms compilation (vs 240-360ms with OpenAI)
✅ Zero Cost: Pure Rust embeddings, no API required
✅ Works Offline: No internet needed after initial setup
✅ Citation-Backed: Every span has exact line number citations
✅ Token Efficient: 95%+ budget utilization
✅ Drop-in Replacement: Works with any LLM

⚡ Performance

# Run benchmarks on your hardware
./target/release/avocado benchmark

# Results (M1 Mac example):
# Single embedding: 1.2ms  (vs ~250ms OpenAI)
# Batch of 100:     8.7ms  (vs ~250ms OpenAI)
# Full compilation: 43ms   (vs ~300ms OpenAI)
#
# Speedup: 6-7x faster ⚡
# Cost: $0 (vs ~$0.0001 per 1K tokens)

See EMBEDDING_PERFORMANCE.md for detailed benchmarks.

Quick Start

Install from crates.io (Easiest)

cargo install avocado-cli

That's it! Now you can use avocado directly:

avocado --version
avocado init
avocado ingest ./docs --recursive
avocado compile "your query"

Docker (Recommended for Server)

Run the server with Docker:

# Run with Docker
docker run -d \
  -p 8765:8765 \
  -v avocado-data:/data \
  --name avocadodb \
  avocadodb/avocadodb:latest

# Or use Docker Compose
docker-compose up -d

# Test the server
curl http://localhost:8765/health

See Docker Guide for complete documentation.

Installation from Source

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and build
git clone https://github.com/avocadodb/avocadodb
cd avocadodb
cargo build --release

# Optional: Set OpenAI API key (only if you want to use OpenAI embeddings)
# By default, AvocadoDB uses local embeddings (no API key required, no Python required!)
#
# Local embeddings strategy (automatic, in priority order):
# 1. Pure Rust with fastembed (semantic, good quality, no Python required) ✅ DEFAULT
#    - Uses all-MiniLM-L6-v2 model (384 dimensions) by default
#    - ONNX-based, fast and efficient
#    - Model downloaded automatically on first use (~90MB)
#    - To increase dimensionality, set AVOCADODB_EMBEDDING_MODEL:
#      * "nomic" or "nomicv15" → 768 dimensions (good balance)
#      * "bgelarge" or "bge-large-en-v1.5" → 1024 dimensions (higher quality)
# 2. Python + sentence-transformers (fallback if fastembed unavailable)
#    - Requires: pip install sentence-transformers
# 3. Hash-based fallback (deterministic, but NOT semantic)
#    - Works always, but poor semantic quality
#
# To use OpenAI embeddings instead:
# export OPENAI_API_KEY="sk-..."
# export AVOCADODB_EMBEDDING_PROVIDER=openai

CLI Usage (Daemon by default)

# Initialize database
./target/release/avocado init

# Get model recommendation (optional)
./target/release/avocado recommend --corpus-size 5000 --use-case production
# Recommends optimal embedding model for your use case

# Ingest documents
./target/release/avocado ingest ./docs --recursive
# Output: Ingested 42 files → 387 spans

# Compile context (uses daemon at http://localhost:8765 by default)
./target/release/avocado compile "How does authentication work?" --budget 8000
# Force local mode (uses .avocado/db.sqlite in current project)
./target/release/avocado compile "How does authentication work?" --local --budget 8000

# Run performance benchmarks
./target/release/avocado benchmark
# Shows real performance on your hardware

GPU-backed server (Modal) quickstart

# Start the daemon with remote GPU embeddings (Modal)
avocado serve --gpu --embed-url https://<your-modal-endpoint>/embed
# or CPU/local (default)
avocado serve

Example Output:

Compiling context for: "How does authentication work?"
Token budget: 8000

[1] docs/authentication.md
Lines 1-23

# Authentication System

Our authentication uses JWT tokens with secure refresh mechanisms...

---

[2] src/middleware/auth.ts
Lines 45-78

export function authenticateRequest(req: Request) {
  const token = req.headers.authorization?.split(' ')[1];
  if (!token) throw new UnauthorizedError();
  ...
}

---

Compiled 12 spans using 7,891 tokens (98.6% utilization)
Compilation time: 243ms
Context hash: e3b0c4429...52b855 (deterministic ✓)

Python SDK

cd sdks/python
pip install -e .

from avocado import AvocadoDB

db = AvocadoDB()
db.ingest("./docs", recursive=True)

result = db.compile("my query", budget=8000)
print(result.text)  # Deterministic every time

TypeScript SDK

cd sdks/typescript
npm install
npm run build

import { AvocadoDB } from 'avocadodb';

const db = new AvocadoDB();
await db.ingest('./docs', recursive: true);

const result = await db.compile('my query', { budget: 8000 });
console.log(result.text);  // Deterministic every time

HTTP Server (Multi-project daemon)

# Start server (binds to 127.0.0.1 by default)
./target/release/avocado-server

# Use the API
curl -X POST http://localhost:8765/compile \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication", "token_budget": 8000, "project": "'"$PWD"'"}'

Docker & Kubernetes Deployment

AvocadoDB is production-ready with full Docker and Kubernetes support.

Docker

Two Docker Images Available:

Image	Contents	Use Case
`avocadodb/avocadodb:latest`	Rust server + SQLite	Standalone, zero-config
`avocadodb/postgres:pg16`	PostgreSQL + pgvector + avocado extension	Production, native SQL

# Quick start with Docker (standalone)
docker run -d -p 8765:8765 -v avocado-data:/data avocadodb/avocadodb:latest

# Or use Docker Compose
docker-compose up -d

# PostgreSQL Extension (native SQL)
docker compose --profile pgext up -d postgres-avocado
# Then connect: psql postgres://avocado:changeme@localhost:5432/avocadodb

Features:

Multi-stage build for minimal image size (~80-100MB)
Multi-architecture support (linux/amd64, linux/arm64)
Non-root user for security
Health checks built-in
Configurable via environment variables

See Docker Guide for complete documentation.

PostgreSQL Extension (NEW in v2.2)

AvocadoDB is available as a native PostgreSQL extension, enabling deterministic context compilation directly in SQL:

-- Enable extensions
CREATE EXTENSION vector;
CREATE EXTENSION avocado;

-- Configure embedding provider (optional - defaults to fastembed)
SELECT avocado_set_embedding_provider('ollama');
SELECT avocado_set_ollama_config('http://localhost:11434', 'bge-m3');

-- Ingest documents
SELECT avocado_ingest_artifact('docs/auth.md', 'Authentication uses JWT tokens...');

-- Compile context directly in SQL
SELECT avocado_compile('How does authentication work?', '{"token_budget": 4000}'::jsonb);

-- Session management
SELECT avocado_create_session('user@example.com', 'Support Chat');
SELECT avocado_add_message('session-id', 'user', 'How do I login?', NULL);
SELECT avocado_get_conversation_history('session-id', 8000);

-- Multi-agent orchestration
SELECT avocado_register_agent('moderator', 'Tech Moderator', 'gpt-4', 'You are a moderator...');
SELECT avocado_get_agent_relations('session-id');

-- Check stats
SELECT avocado_stats();

Features:

Native SQL interface (like pgvector)
Multiple embedding providers (fastembed, Ollama, OpenAI)
Full session and multi-agent support
HNSW vector index for fast similarity search
Works with existing PostgreSQL tooling

Kubernetes

# Deploy to Kubernetes
kubectl apply -k k8s/

# Verify deployment
kubectl get pods -l app=avocadodb

Includes:

Production-ready Deployment manifests
Horizontal scaling support
Persistent storage configuration
Ingress with TLS/HTTPS
ConfigMaps and Secrets management
Resource limits and health checks

See Kubernetes Guide for complete documentation.

Environment Variables

Variable	Default	Description
`PORT`	`8765`	HTTP server port
`BIND_ADDR`	`127.0.0.1`	Bind address (set `0.0.0.0` to expose publicly)
`RUST_LOG`	`info`	Log level
`AVOCADODB_EMBEDDING_PROVIDER`	`local`	Provider: `local`, `ollama`, or `openai`
`AVOCADODB_EMBEDDING_MODEL`	`minilm`	Fastembed model (minilm, nomic, bgelarge)
`AVOCADODB_OLLAMA_URL`	`http://localhost:11434`	Ollama server URL
`AVOCADODB_OLLAMA_MODEL`	`bge-m3`	Ollama model name
`OPENAI_API_KEY`	-	OpenAI API key (if using OpenAI)
`AVOCADODB_ROOT`	unset	Optional project root. When set, all `project` paths must be under this directory. Requests outside are rejected.
`API_TOKEN`	unset	If set, requires header `X-Avocado-Token` to be present and equal for all routes (except `/health`, `/api-docs/*`).
`MAX_BODY_BYTES`	`2097152` (2MB)	Request body size limit to protect against large payloads.

Security note:

Do not expose the server publicly without protection. If you must, set BIND_ADDR=0.0.0.0 and front it with auth.
For local safety, clients always send an explicit project (their current working directory), and the server normalizes paths and can restrict to AVOCADODB_ROOT.

How It Works

Architecture

Query → Embed → [Semantic Search + Lexical Search] → Hybrid Fusion
      → MMR Diversification → Token Packing → Deterministic Sort → WorkingSet

Key Innovations

Span-Based Indexing: Documents are split into spans (20-50 lines) with precise line numbers
Hybrid Retrieval: Combines semantic (vector) and lexical (keyword) search
Deterministic Ordering: Results sorted by (artifact_id, start_line) for reproducibility
Greedy Token Packing: Maximizes token budget utilization without duplicates

Explainability & Reproducibility (v2.1)

NEW in v2.1: Enhanced determinism, explainability, and quality tracking features based on production feedback.

Version Manifest

Every compilation now includes a version manifest for full reproducibility:

// Access manifest from WorkingSet
let manifest = working_set.manifest.unwrap();
println!("Avocado version: {}", manifest.avocado_version);
println!("Embedding model: {}", manifest.embedding_model);
println!("Context hash: {}", manifest.context_hash);

The manifest includes: avocado version, tokenizer, embedding model, embedding dimensions, chunking params, index params, and a SHA256 context hash.

Explain Plan

Understand exactly how context was selected with explain mode:

# CLI with explain
avocado compile "authentication" --explain

# Shows candidates at each pipeline stage:
# - Semantic search (top 50 from HNSW)
# - Lexical search (keyword matches)
# - Hybrid fusion (RRF combination)
# - MMR diversification
# - Token packing
# - Final deterministic order

# Python SDK
result = db.compile("auth", budget=8000, explain=True)
if result.explain:
    print(f"Semantic candidates: {len(result.explain.semantic_candidates)}")
    print(f"Final spans: {len(result.explain.final_order)}")

Working Set Diff

Compare retrieval results across corpus versions for auditing:

use avocado_core::{diff_working_sets, summarize_diff};

let diff = diff_working_sets(&before, &after);
println!("{}", summarize_diff(&diff));
// Output: "3 added, 1 removed, 2 reranked"

Smart Incremental Rebuild

Only re-embed changed files - unchanged content is automatically skipped:

# First ingest
avocado ingest ./docs --recursive
# Ingested 42 files → 387 spans

# Re-ingest after editing 3 files
avocado ingest ./docs --recursive
# Skipped 39 unchanged, Updated 3 files → 28 spans

Content-hash comparison ensures minimal re-embedding while keeping the index fresh.

Evaluation Metrics

Built-in support for golden set testing and quality metrics:

use avocado_core::{GoldenQuery, evaluate};

let queries = vec![
    GoldenQuery {
        query: "authentication".to_string(),
        expected_paths: vec!["docs/auth.md".to_string()],
        k: 10,
    },
];

let summary = evaluate(&queries, &db, &index, &config).await?;
println!("Recall@10: {:.2}%", summary.mean_recall * 100.0);
println!("MRR: {:.3}", summary.mean_mrr);

Session Management

NEW in v2.0: Multi-turn conversation tracking with context compilation

AvocadoDB now supports session management, enabling AI agents to maintain conversation history and context across multiple interactions.

Quick Example

from avocado import AvocadoDB

db = AvocadoDB(mode="http")

# Create a session
session = db.create_session(user_id="alice", title="Project Q&A")

# Multi-turn conversation
result = session.compile("What is AvocadoDB?", budget=8000)
session.add_message("assistant", "AvocadoDB is a deterministic context database...")

result2 = session.compile("How does the compiler work?")
session.add_message("assistant", "The compiler uses hybrid search...")

# Get conversation history
history = session.get_history()

# Replay for debugging
replay = session.replay()

Features

Multi-turn conversations: Track user queries and agent responses
Context compilation: Automatically compile context for each query
Conversation history: Retrieve formatted history with token limiting
Session replay: Debug agent behavior by replaying entire sessions
Persistence: Sessions stored in SQLite with full ACID guarantees

Available in

✅ Python SDK: Full session support with Session class
✅ TypeScript SDK: Complete session management API
✅ CLI: Session commands for interactive use
✅ HTTP API: RESTful endpoints for all session operations

See SESSION_MANAGEMENT.md for complete documentation.

Why Determinism Matters

When RAG systems return different context for the same query:

LLMs produce inconsistent answers
Users can't verify results
Debugging is impossible
Trust is broken

AvocadoDB fixes this with deterministic compilation - same query, same context, every time.

Verify Determinism Yourself

# Run the same query multiple times
avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

# Same hash every single time! ✅

Performance

Phase 1 achieves production-ready performance:

Metric	Target	Actual	Status
Compilation time (8K tokens)	< 500ms	~50ms avg	✅ 10x faster
Token budget utilization	> 95%	90-95%	✅ Excellent
Determinism	100%	100%	✅ Perfect
Duplicate spans	0	0	✅ Perfect

Breakdown for 8K token budget compilation (with Pure Rust embeddings):

Embed query:          1-5ms      (2-5% of total) - Pure Rust (fastembed), local
Semantic search:      <1ms       (Vector similarity, HNSW)
Lexical search:       <1ms       (SQL LIKE query)
Hybrid fusion:        <1ms       (RRF score combination)
MMR diversification:  5-10ms     (Diversity selection)
Token packing:        <1ms       (Greedy budget allocation)
Deterministic sort:   <1ms       (Stable sort)
Build context:        <1ms       (Text concatenation)
Count tokens:         30-40ms    (tiktoken encoding)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOTAL:                40-60ms    (6x faster than OpenAI!)

Performance Comparison:

Metric	Pure Rust (fastembed)	OpenAI API
Query Embedding	1-5ms	200-300ms
Total Compilation	40-60ms	240-360ms
Throughput	200-1000 texts/sec	3-5 batches/sec
Cost	Free	~$0.0001/1K tokens
Rate Limits	None	Varies by tier
Offline	✅ Yes	❌ No
Quality	Good (384 dims)	Excellent (1536 dims)

Pure Rust embeddings are 6x faster and completely free! Optimization: All algorithms run in <15ms total (highly optimized)

See docs/performance.md for detailed analysis and scaling characteristics.

CLI Reference

`avocado init`

Initialize a new AvocadoDB database:

avocado init [--path <db-path>]

Creates .avocado/ directory with SQLite database and vector index.

`avocado ingest`

Ingest documents into the database:

avocado ingest <path> [--recursive]

Examples:

# Ingest single file
avocado ingest README.md

# Ingest entire directory recursively
avocado ingest docs/ --recursive

# Ingest specific file types
avocado ingest src/ --recursive --include "*.rs,*.md,*.toml"

The ingestion process:

Reads document content
Extracts spans (20-50 lines with smart boundaries)
Generates embeddings for each span (local fastembed by default)
Stores in SQLite database

`avocado compile`

Compile a deterministic context for a query:

avocado compile <query> [OPTIONS]

Options:

--budget <tokens>: Token budget (default: 8000)
--json: Output as JSON instead of human-readable format
--explain: Show explain plan with candidates at each pipeline stage
--mmr-lambda <0.0-1.0>: MMR diversity parameter (default: 0.5)
- Higher values (0.7-1.0) = more relevant but potentially redundant
- Lower values (0.0-0.3) = more diverse but potentially less relevant
--semantic-weight <float>: Semantic search weight (default: 0.7)
--lexical-weight <float>: Lexical search weight (default: 0.3)

Examples:

# Basic compilation
avocado compile "How does authentication work?"

# Large context window
avocado compile "error handling patterns" --budget 16000

# Prioritize diversity over relevance
avocado compile "testing strategies" --mmr-lambda 0.3

# Tune search weights (more keyword matching)
avocado compile "API endpoints" --semantic-weight 0.5 --lexical-weight 0.5

# JSON output for programmatic use
avocado compile "authentication" --budget 8000 --json

JSON Output Format:

{
  "text": "[1] docs/auth.md\nLines 1-23\n\n# Authentication...",
  "spans": [
    {
      "id": "uuid",
      "artifact_id": "uuid",
      "start_line": 1,
      "end_line": 23,
      "text": "# Authentication...",
      "embedding": [0.002, 0.013, ...],
      "embedding_model": "text-embedding-ada-002",
      "token_count": 127,
      "metadata": null
    }
  ],
  "citations": [
    {
      "span_id": "uuid",
      "artifact_id": "uuid",
      "artifact_path": "docs/auth.md",
      "start_line": 1,
      "end_line": 23,
      "score": 0.0
    }
  ],
  "tokens_used": 2232,
  "query": "authentication",
  "compilation_time_ms": 243
}

`avocado stats`

Show database statistics:

avocado stats

Example output:

Database Statistics:
  Artifacts: 42
  Spans: 387
  Total Tokens: 125,431
  Average Tokens/Span: 324

`avocado clear`

Clear all data from the database:

avocado clear

Warning: This permanently deletes all ingested documents and embeddings!

Library Usage (Rust)

Use AvocadoDB as a library in your Rust projects:

[dependencies]
avocado-core = "2.2"
tokio = { version = "1.35", features = ["full"] }

use avocado_core::{Database, VectorIndex, compiler, types::CompilerConfig};

#[tokio::main]
async fn main() -> avocado_core::types::Result<()> {
    // Open database
    let db = Database::new(".avocado/db.sqlite")?;

    // Load vector index from database
    let index = VectorIndex::from_database(&db)?;

    // Configure compilation
    let config = CompilerConfig {
        token_budget: 8000,
        semantic_weight: 0.7,
        lexical_weight: 0.3,
        mmr_lambda: 0.5,
        enable_mmr: true,
    };

    // Compile context
    let working_set = compiler::compile(
        "How does authentication work?",
        config,
        &db,
        &index,
        Some("your-openai-api-key")
    ).await?;

    println!("Compiled {} spans using {} tokens",
        working_set.spans.len(),
        working_set.tokens_used
    );

    println!("Deterministic hash: {}", working_set.deterministic_hash());

    // Use working_set.text in your LLM prompt
    println!("Context:\n{}", working_set.text);

    Ok(())
}

Development

Project Structure

avocadodb/
├── avocado-core/      # Core engine (Rust library)
├── avocado-cli/       # Command-line tool
├── avocado-server/    # HTTP server (REST API)
├── avocado-pgext/     # PostgreSQL extension (pgrx)
├── python/            # Python SDK
├── migrations/        # Database schema
├── tests/             # Integration tests
└── docs/              # Documentation

Running Tests

# Unit tests
cargo test

# Integration tests (requires OPENAI_API_KEY)
cargo test --test determinism -- --ignored
cargo test --test performance -- --ignored
cargo test --test correctness -- --ignored

Building

# Development build
cargo build

# Release build
cargo build --release

# Run CLI
cargo run --bin avocado -- --help

# Run server
cargo run --bin avocado-server

Roadmap

Phase 1 ✅ (Complete)

Core span extraction with smart boundaries
OpenAI embeddings integration
Hybrid search (semantic + lexical)
MMR diversification algorithm
Deterministic compilation (100% verified)
CLI tool with full features
HTTP server
Performance optimization (240ms avg)
Comprehensive documentation

Phase 2 - Advanced Features

Version manifest for full reproducibility
Explain plan for retrieval debugging
Working set diff for corpus auditing
Smart incremental rebuild (content-hash based)
Evaluation metrics (recall@k, MRR)
PostgreSQL Extension (avocado-pgext) - Native SQL interface
Ollama Integration - Local embedding models (bge-m3, nomic, etc.)
Multi-agent Orchestration - Agent registration, relations tracking
Multi-modal support (images, code)
Advanced retrieval (BM25, learned rankers)
Framework integrations (LangChain, LlamaIndex)

Phase 3 - Agent Memory

Session management
Working set versioning
Collaborative features
Memory systems

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Testing

AvocadoDB includes comprehensive test suites to validate determinism and performance:

# Run all tests and generate report
./scripts/run-tests.sh

# Run determinism validation only (100 iterations)
./scripts/test-determinism.sh

# Run performance benchmarks
./scripts/benchmark.sh

See docs/testing.md for complete testing documentation.

Learn More

Quick Start Guide - Get running in 5 minutes
Examples - Real-world usage patterns
Testing Guide - Validation and benchmarking
Performance Analysis
UI Improvements

Built by the AvocadoDB Team | Making retrieval deterministic, one context at a time.

Commit count: 0