| Crates.io | trueno-rag-cli |
| lib.rs | trueno-rag-cli |
| version | 0.1.4 |
| created_at | 2025-11-30 18:38:35.333201+00 |
| updated_at | 2026-01-11 11:58:16.631123+00 |
| description | CLI for Trueno-RAG pipeline |
| homepage | |
| repository | https://github.com/paiml/trueno-rag |
| max_upload_size | |
| id | 1958601 |
| size | 160,527 |
SIMD-accelerated RAG pipeline built on Trueno compute primitives. Part of the Sovereign AI Stack.
[dependencies]
trueno-rag = "0.1.8"
use trueno_rag::{
pipeline::RagPipelineBuilder,
chunk::RecursiveChunker,
embed::MockEmbedder,
rerank::LexicalReranker,
fusion::FusionStrategy,
Document,
};
let mut pipeline = RagPipelineBuilder::new()
.chunker(RecursiveChunker::new(512, 50))
.embedder(MockEmbedder::new(384))
.reranker(LexicalReranker::new())
.fusion(FusionStrategy::RRF { k: 60.0 })
.build()?;
let doc = Document::new("Your content here...").with_title("Doc Title");
pipeline.index_document(&doc)?;
let (results, context) = pipeline.query_with_context("your query", 5)?;
# Basic examples
cargo run --example basic_rag
cargo run --example chunking_strategies
cargo run --example hybrid_search
cargo run --example metrics_evaluation
# With semantic embeddings (downloads ~90MB ONNX model on first run)
cargo run --example semantic_embeddings --features embeddings
# With compressed index persistence
cargo run --example compressed_index --features compression
# With NVIDIA Nemotron embeddings (requires GGUF model file)
NEMOTRON_MODEL_PATH=/path/to/model.gguf cargo run --example nemotron_embeddings --features nemotron
Production-quality vector embeddings via FastEmbed (ONNX Runtime):
trueno-rag = { version = "0.1.8", features = ["embeddings"] }
use trueno_rag::embed::{FastEmbedder, EmbeddingModelType, Embedder};
let embedder = FastEmbedder::new(EmbeddingModelType::AllMiniLmL6V2)?;
let embedding = embedder.embed("Hello, world!")?;
// 384-dimensional embeddings
Available models:
AllMiniLmL6V2 - Fast, 384 dims (default)AllMiniLmL12V2 - Better quality, 384 dimsBgeSmallEnV15 - Balanced, 384 dimsBgeBaseEnV15 - Higher quality, 768 dimsNomicEmbedTextV1 - Retrieval optimized, 768 dimsHigh-quality 4096-dimensional embeddings via GGUF model inference:
trueno-rag = { version = "0.1.8", features = ["nemotron"] }
use trueno_rag::embed::{NemotronEmbedder, NemotronConfig, Embedder};
let config = NemotronConfig::new("models/NV-Embed-v2-Q4_K.gguf")
.with_gpu(true)
.with_normalize(true);
let embedder = NemotronEmbedder::new(config)?;
// Asymmetric retrieval - different prefixes for queries vs documents
let query_emb = embedder.embed_query("What is machine learning?")?;
let doc_emb = embedder.embed_document("Machine learning is a branch of AI...")?;
LZ4/ZSTD compressed index persistence:
trueno-rag = { version = "0.1.8", features = ["compression"] }
use trueno_rag::{compressed::Compression, BM25Index};
let bytes = index.to_compressed_bytes(Compression::Zstd)?;
// 4-6x compression ratio
trueno-rag is part of the Sovereign AI Stack:
| Crate | Version | Purpose |
|---|---|---|
| trueno | 0.11 | SIMD/GPU compute primitives |
| trueno-db | 0.3.10 | GPU-first analytics database |
| realizar | 0.5.1 | GGUF/APR model inference |
| fastembed | 5.x | ONNX embeddings |
make test # Run tests
make lint # Clippy lints
make coverage # Coverage report (95%+ target)
make book # Build documentation book
MIT