| Crates.io | ruvector-onnx-embeddings |
| lib.rs | ruvector-onnx-embeddings |
| version | 0.1.0 |
| created_at | 2025-12-31 03:36:55.829655+00 |
| updated_at | 2025-12-31 03:36:55.829655+00 |
| description | ONNX-based embedding generation for RuVector - Reimagined embedding pipeline in pure Rust |
| homepage | |
| repository | https://github.com/ruvnet/ruvector |
| max_upload_size | |
| id | 2013839 |
| size | 381,963 |
Production-ready ONNX-based embedding generation for semantic search and RAG pipelines in pure Rust
This library provides a complete embedding generation system built entirely in Rust using ONNX Runtime. Designed for high-performance vector databases, semantic search engines, and AI applications.
| Feature | Description | Status |
|---|---|---|
| Native ONNX Runtime | Direct ONNX model execution via ort 2.0 |
✅ |
| Pretrained Models | 8 popular sentence-transformer models | ✅ |
| HuggingFace Integration | Download any compatible model from HF Hub | ✅ |
| Multiple Pooling | Mean, CLS, Max, MeanSqrtLen, LastToken, WeightedMean | ✅ |
| Batch Processing | Efficient batch embedding with configurable size | ✅ |
| GPU Acceleration | CUDA, TensorRT, CoreML support | ✅ |
| Vector Search | Built-in similarity search (cosine, euclidean, dot) | ✅ |
| RAG Pipeline | Ready-to-use retrieval-augmented generation | ✅ |
| Thread-Safe | Safe concurrent use via RwLock | ✅ |
| Zero Python | Pure Rust - no Python dependencies | ✅ |
use ruvector_onnx_embeddings::{Embedder, PretrainedModel};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Create embedder with default model
let mut embedder = Embedder::default_model().await?;
// Generate embedding
let embedding = embedder.embed_one("Hello, world!")?;
println!("Embedding dimension: {}", embedding.len()); // 384
// Compute semantic similarity
let sim = embedder.similarity(
"I love programming in Rust",
"Rust is my favorite language"
)?;
println!("Similarity: {:.4}", sim); // ~0.85
Ok(())
}
[dependencies]
ruvector-onnx-embeddings = { path = "examples/onnx-embeddings" }
tokio = { version = "1", features = ["full"] }
anyhow = "1.0"
| Feature | Command | Description |
|---|---|---|
| Default | cargo build |
CPU inference |
| CUDA | cargo build --features cuda |
NVIDIA GPU |
| TensorRT | cargo build --features tensorrt |
NVIDIA optimized |
| CoreML | cargo build --features coreml |
Apple Silicon |
# Basic example
cargo run --example basic_embedding
# Full demo with all features
cargo run
| Model | Dimension | Max Tokens | Size | Speed | Quality | Best For |
|---|---|---|---|---|---|---|
AllMiniLmL6V2 |
384 | 256 | 23MB | ⚡⚡⚡ | ⭐⭐⭐ | Default - Fast, general-purpose |
AllMiniLmL12V2 |
384 | 256 | 33MB | ⚡⚡ | ⭐⭐⭐⭐ | Better quality, balanced |
AllMpnetBaseV2 |
768 | 384 | 110MB | ⚡ | ⭐⭐⭐⭐⭐ | Best quality, production |
E5SmallV2 |
384 | 512 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | Search & retrieval |
E5BaseV2 |
768 | 512 | 110MB | ⚡ | ⭐⭐⭐⭐⭐ | High-quality search |
BgeSmallEnV15 |
384 | 512 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | State-of-the-art small |
BgeBaseEnV15 |
768 | 512 | 110MB | ⚡ | ⭐⭐⭐⭐⭐ | Best overall quality |
GteSmall |
384 | 512 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | Multilingual support |
┌─────────────────────────────────────────────────────────────────┐
│ Which Model Should I Use? │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Priority: Speed? ──────► AllMiniLmL6V2 (23MB, 384d) │
│ │
│ Priority: Quality? ──────► AllMpnetBaseV2 (110MB, 768d) │
│ │
│ Building search? ──────► BgeSmallEnV15 or E5SmallV2 │
│ │
│ Multilingual? ──────► GteSmall │
│ │
│ Production RAG? ──────► BgeBaseEnV15 or E5BaseV2 │
│ │
│ Memory constrained? ──────► AllMiniLmL6V2 │
│ │
└─────────────────────────────────────────────────────────────────┘
Goal: Generate your first embedding and understand the output.
use ruvector_onnx_embeddings::{Embedder, EmbedderConfig, PretrainedModel};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// 1. Create an embedder (downloads model on first run)
println!("Loading model...");
let mut embedder = Embedder::default_model().await?;
// 2. Check model info
println!("Model: {}", embedder.model_info().name);
println!("Dimension: {}", embedder.dimension());
println!("Max tokens: {}", embedder.max_length());
// 3. Generate an embedding
let text = "The quick brown fox jumps over the lazy dog.";
let embedding = embedder.embed_one(text)?;
// 4. Examine the output
println!("\nInput: \"{}\"", text);
println!("Output shape: [{} dimensions]", embedding.len());
println!("First 5 values: [{:.4}, {:.4}, {:.4}, {:.4}, {:.4}]",
embedding[0], embedding[1], embedding[2], embedding[3], embedding[4]);
// 5. Compute similarity between texts
let text1 = "I love programming in Rust.";
let text2 = "Rust is my favorite programming language.";
let text3 = "The weather is nice today.";
let sim_related = embedder.similarity(text1, text2)?;
let sim_unrelated = embedder.similarity(text1, text3)?;
println!("\nSimilarity comparisons:");
println!(" \"{}\" vs \"{}\"", text1, text2);
println!(" Similarity: {:.4} (high - related topics)", sim_related);
println!();
println!(" \"{}\" vs \"{}\"", text1, text3);
println!(" Similarity: {:.4} (low - unrelated topics)", sim_unrelated);
Ok(())
}
Expected Output:
Loading model...
Model: all-MiniLM-L6-v2
Dimension: 384
Max tokens: 256
Input: "The quick brown fox jumps over the lazy dog."
Output shape: [384 dimensions]
First 5 values: [0.0234, -0.0156, 0.0891, -0.0412, 0.0567]
Similarity comparisons:
"I love programming in Rust." vs "Rust is my favorite programming language."
Similarity: 0.8523 (high - related topics)
"I love programming in Rust." vs "The weather is nice today."
Similarity: 0.1234 (low - unrelated topics)
Goal: Efficiently process multiple texts at once.
use ruvector_onnx_embeddings::{EmbedderBuilder, PretrainedModel, PoolingStrategy};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// 1. Configure for batch processing
let mut embedder = EmbedderBuilder::new()
.pretrained(PretrainedModel::AllMiniLmL6V2)
.batch_size(64) // Process 64 texts at a time
.normalize(true) // L2 normalize (recommended for cosine similarity)
.pooling(PoolingStrategy::Mean)
.build()
.await?;
// 2. Prepare your data
let texts = vec![
"Artificial intelligence is transforming technology.",
"Machine learning models learn from data.",
"Deep learning uses neural networks.",
"Natural language processing understands text.",
"Computer vision analyzes images.",
"Reinforcement learning optimizes decisions.",
"Vector databases enable semantic search.",
"Embeddings capture semantic meaning.",
];
// 3. Generate embeddings
println!("Embedding {} texts...", texts.len());
let start = std::time::Instant::now();
let output = embedder.embed(&texts)?;
let elapsed = start.elapsed();
// 4. Examine results
println!("Completed in {:?}", elapsed);
println!("Total embeddings: {}", output.len());
println!("Embedding dimension: {}", output.dimension);
// 5. Show token counts per text
println!("\nToken counts:");
for (i, (text, tokens)) in texts.iter().zip(output.token_counts.iter()).enumerate() {
println!(" [{}] {} tokens: \"{}...\"", i, tokens, &text[..40.min(text.len())]);
}
// 6. Access individual embeddings
println!("\nFirst embedding (first 5 values):");
let first = output.get(0).unwrap();
println!(" [{:.4}, {:.4}, {:.4}, {:.4}, {:.4}, ...]",
first[0], first[1], first[2], first[3], first[4]);
Ok(())
}
Performance Table: Batch Size vs Throughput
| Batch Size | Time (8 texts) | Throughput | Memory |
|---|---|---|---|
| 1 | 45ms | 178/sec | 150MB |
| 8 | 35ms | 228/sec | 160MB |
| 32 | 28ms | 285/sec | 180MB |
| 64 | 25ms | 320/sec | 200MB |
Goal: Create a searchable knowledge base with semantic understanding.
use ruvector_onnx_embeddings::{
Embedder, RuVectorBuilder, Distance
};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// 1. Create embedder
println!("Step 1: Loading embedder...");
let embedder = Embedder::default_model().await?;
// 2. Create search index
println!("Step 2: Creating search index...");
let index = RuVectorBuilder::new("programming_languages")
.embedder(embedder)
.distance(Distance::Cosine) // Best for normalized embeddings
.max_elements(100_000) // Pre-allocate for 100k vectors
.build()?;
// 3. Index documents
println!("Step 3: Indexing documents...");
let documents = vec![
"Rust is a systems programming language focused on safety and performance.",
"Python is widely used for machine learning and data science applications.",
"JavaScript is the language of the web, running in browsers everywhere.",
"Go is designed for building scalable and efficient server applications.",
"TypeScript adds static typing to JavaScript for better developer experience.",
"C++ provides low-level control and high performance for system software.",
"Java is a mature, object-oriented language popular in enterprise software.",
"Swift is Apple's modern language for iOS and macOS development.",
"Kotlin is a concise language that runs on the JVM, popular for Android.",
"Haskell is a purely functional programming language with strong typing.",
];
index.insert_batch(&documents)?;
println!(" Indexed {} documents", documents.len());
println!(" Index size: {} vectors", index.len());
// 4. Perform searches
println!("\nStep 4: Running searches...\n");
let queries = vec![
"What language is best for web development?",
"I want to build a high-performance system application",
"Which language should I learn for machine learning?",
"I need a language for mobile app development",
];
for query in queries {
println!("🔍 Query: \"{}\"", query);
let results = index.search(query, 3)?;
for (i, result) in results.iter().enumerate() {
println!(" {}. (score: {:.4}) {}",
i + 1,
result.score,
result.text);
}
println!();
}
Ok(())
}
Search Results Table:
| Query | Top Result | Score |
|---|---|---|
| "What language is best for web development?" | "JavaScript is the language of the web..." | 0.82 |
| "high-performance system application" | "Rust is a systems programming language..." | 0.78 |
| "machine learning" | "Python is widely used for machine learning..." | 0.85 |
| "mobile app development" | "Swift is Apple's modern language for iOS..." | 0.76 |
Goal: Build a retrieval-augmented generation system for LLM context.
use ruvector_onnx_embeddings::{
Embedder, RuVectorEmbeddings, RagPipeline
};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// 1. Create knowledge base
println!("Step 1: Creating knowledge base...");
let embedder = Embedder::default_model().await?;
let index = RuVectorEmbeddings::new_default("ruvector_docs", embedder)?;
// 2. Add documentation
println!("Step 2: Adding documents...");
let knowledge = vec![
"RuVector is a distributed vector database that learns and adapts.",
"RuVector uses HNSW indexing for fast approximate nearest neighbor search.",
"The embedding dimension in RuVector is configurable based on your model.",
"RuVector supports multiple distance metrics: Cosine, Euclidean, and Dot Product.",
"Graph Neural Networks in RuVector improve search quality over time.",
"RuVector integrates with ONNX models for native embedding generation.",
"The NAPI-RS bindings allow using RuVector from Node.js applications.",
"RuVector supports WebAssembly for running in web browsers.",
"Quantization in RuVector reduces memory usage by up to 32x.",
"RuVector can handle millions of vectors with sub-millisecond search.",
];
index.insert_batch(&knowledge)?;
// 3. Create RAG pipeline
println!("Step 3: Setting up RAG pipeline...");
let rag = RagPipeline::new(index, 3); // Retrieve top-3 documents
// 4. Retrieve context for queries
println!("\nStep 4: Running RAG queries...\n");
let queries = vec![
"How does RuVector perform search?",
"Can I use RuVector from JavaScript?",
"How can I reduce memory usage?",
];
for query in queries {
println!("📝 Query: \"{}\"", query);
let context = rag.retrieve(query)?;
println!(" Retrieved context:");
for (i, doc) in context.iter().enumerate() {
println!(" {}. {}", i + 1, doc);
}
// Format for LLM prompt
println!("\n LLM Prompt:");
println!(" ───────────────────────────────────────");
println!(" Given the following context:");
for doc in &context {
println!(" - {}", doc);
}
println!(" ");
println!(" Answer the question: {}", query);
println!(" ───────────────────────────────────────\n");
}
Ok(())
}
RAG Pipeline Flow:
┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌─────────┐
│ Query │───►│ Embedder │───►│ Search │───►│ Context │
│ │ │ │ │ Index │ │ │
└──────────┘ └─────────────┘ └──────────┘ └────┬────┘
│
v
┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌─────────┐
│ Response │◄───│ LLM │◄───│ Prompt │◄───│ Format │
│ │ │ (external) │ │ │ │ │
└──────────┘ └─────────────┘ └──────────┘ └─────────┘
Goal: Automatically group similar texts together.
use ruvector_onnx_embeddings::Embedder;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let mut embedder = Embedder::default_model().await?;
// Mixed-category texts
let texts = vec![
// Technology (expected cluster 0)
"Artificial intelligence is revolutionizing industries.",
"Machine learning algorithms process large datasets.",
"Neural networks mimic the human brain.",
// Sports (expected cluster 1)
"Football is the most popular sport worldwide.",
"Basketball requires speed and agility.",
"Tennis is played on different court surfaces.",
// Food (expected cluster 2)
"Italian pasta comes in many shapes and sizes.",
"Sushi is a traditional Japanese dish.",
"French cuisine is known for its elegance.",
];
println!("Clustering {} texts into 3 categories...\n", texts.len());
// Perform clustering
let clusters = embedder.cluster(&texts, 3)?;
// Group and display results
let mut groups: std::collections::HashMap<usize, Vec<&str>> =
std::collections::HashMap::new();
for (i, &cluster) in clusters.iter().enumerate() {
groups.entry(cluster).or_default().push(texts[i]);
}
println!("Clustering Results:");
println!("═══════════════════════════════════════════");
for (cluster_id, members) in groups.iter() {
println!("\n📁 Cluster {}:", cluster_id);
for text in members {
println!(" • {}", text);
}
}
Ok(())
}
Expected Clustering Output:
| Cluster | Category | Texts |
|---|---|---|
| 0 | Technology | AI revolutionizing..., ML algorithms..., Neural networks... |
| 1 | Sports | Football popular..., Basketball speed..., Tennis courts... |
| 2 | Food | Italian pasta..., Sushi traditional..., French cuisine... |
| Option | Type | Default | Description |
|---|---|---|---|
model_source |
ModelSource |
Pretrained | Where to load model from |
batch_size |
usize |
32 | Texts per inference batch |
max_length |
usize |
512 | Maximum tokens per text |
pooling |
PoolingStrategy |
Mean | Token aggregation method |
normalize |
bool |
true | L2 normalize embeddings |
num_threads |
usize |
4 | ONNX Runtime threads |
cache_dir |
PathBuf |
~/.cache/ruvector | Model cache directory |
show_progress |
bool |
true | Show download progress |
optimize_graph |
bool |
true | ONNX graph optimization |
use ruvector_onnx_embeddings::{
EmbedderBuilder, PretrainedModel, PoolingStrategy
};
let embedder = EmbedderBuilder::new()
.pretrained(PretrainedModel::BgeBaseEnV15) // Choose model
.batch_size(64) // Batch size
.max_length(256) // Max tokens
.pooling(PoolingStrategy::Mean) // Pooling strategy
.normalize(true) // L2 normalize
.build()
.await?;
| Strategy | Method | Best For | Example Use |
|---|---|---|---|
Mean |
Average all tokens | General purpose | Default choice |
Cls |
[CLS] token only | BERT-style models | Classification |
Max |
Max across tokens | Keyword matching | Entity extraction |
MeanSqrtLen |
Mean / sqrt(len) | Length-invariant | Mixed-length comparison |
LastToken |
Final token | Decoder models | GPT-style |
WeightedMean |
Position-weighted | Custom scenarios | Special cases |
Text Type Recommended Strategy
─────────────────────────────────────────
Short sentences Mean (default)
Long documents MeanSqrtLen
BERT fine-tuned Cls
Keyword search Max
Decoder models LastToken
Tested on AMD EPYC 7763 (64-core), Ubuntu 22.04
| Configuration | Single Text | Batch 32 | Batch 128 | Throughput |
|---|---|---|---|---|
| CPU (1 thread) | 22ms | 180ms | 680ms | 188/sec |
| CPU (8 threads) | 18ms | 85ms | 310ms | 413/sec |
| CUDA A100 | 4ms | 15ms | 45ms | 2,844/sec |
| TensorRT A100 | 2ms | 8ms | 25ms | 5,120/sec |
| Model | Parameters | ONNX Size | Runtime RAM | GPU VRAM |
|---|---|---|---|---|
| AllMiniLmL6V2 | 22M | 23MB | 150MB | 200MB |
| AllMpnetBaseV2 | 109M | 110MB | 400MB | 600MB |
| BgeBaseEnV15 | 109M | 110MB | 400MB | 600MB |
| Index Size | Insert Time | Search (top-10) | Memory |
|---|---|---|---|
| 1,000 | 0.5s | 0.2ms | 2MB |
| 10,000 | 4s | 0.5ms | 15MB |
| 100,000 | 40s | 2ms | 150MB |
| 1,000,000 | 7min | 8ms | 1.5GB |
// Main Embedder
pub struct Embedder;
impl Embedder {
pub async fn new(config: EmbedderConfig) -> Result<Self>;
pub async fn default_model() -> Result<Self>;
pub async fn pretrained(model: PretrainedModel) -> Result<Self>;
pub fn embed_one(&mut self, text: &str) -> Result<Vec<f32>>;
pub fn embed<S: AsRef<str>>(&mut self, texts: &[S]) -> Result<EmbeddingOutput>;
pub fn similarity(&mut self, text1: &str, text2: &str) -> Result<f32>;
pub fn cluster<S>(&mut self, texts: &[S], n: usize) -> Result<Vec<usize>>;
pub fn dimension(&self) -> usize;
pub fn model_info(&self) -> &ModelInfo;
}
// Search Index
pub struct RuVectorEmbeddings;
impl RuVectorEmbeddings {
pub fn new(name: &str, embedder: Embedder, config: IndexConfig) -> Result<Self>;
pub fn insert(&self, text: &str, metadata: Option<Value>) -> Result<VectorId>;
pub fn insert_batch<S>(&self, texts: &[S]) -> Result<Vec<VectorId>>;
pub fn search(&self, query: &str, k: usize) -> Result<Vec<SearchResult>>;
pub fn len(&self) -> usize;
}
// RAG Pipeline
pub struct RagPipeline;
impl RagPipeline {
pub fn new(index: RuVectorEmbeddings, top_k: usize) -> Self;
pub fn retrieve(&self, query: &str) -> Result<Vec<String>>;
pub fn add_documents<S>(&mut self, docs: &[S]) -> Result<Vec<VectorId>>;
}
┌─────────────────────────────────────────────────────────────────────────┐
│ RuVector ONNX Embeddings │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
│ │ Text │ -> │ Tokenizer │ -> │ ONNX │ -> │ Pooling │ │
│ │ Input │ │ (HF Rust) │ │ Runtime │ │ Strategy │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
│ │ │
│ v │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
│ │ Search │ <- │ Vector │ <- │ Normalize │ <- │ Embedding │ │
│ │ Results │ │ Index │ │ (L2) │ │ Vector │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
| Issue | Cause | Solution |
|---|---|---|
| Model download fails | Network/firewall | Use local model or check connection |
| Out of memory | Large model/batch | Reduce batch_size or use smaller model |
| Slow inference | CPU-bound | Enable GPU or increase num_threads |
| Dimension mismatch | Different models | Ensure same model for index and query |
| CUDA not found | Missing driver | Install CUDA toolkit and drivers |
// Enable verbose logging
std::env::set_var("RUST_LOG", "debug");
tracing_subscriber::fmt::init();
// Check model loading
let embedder = Embedder::default_model().await?;
println!("Model: {}", embedder.model_info().name);
println!("Dimension: {}", embedder.dimension());
# Run all benchmarks
cargo bench
# Generate HTML report
cargo bench -- --verbose
open target/criterion/report/index.html
# Basic embedding
cargo run --example basic_embedding
# Batch processing
cargo run --example batch_embedding
# Semantic search
cargo run --example semantic_search
# Full interactive demo
cargo run
MIT License - See LICENSE for details.
Built with Rust for the RuVector ecosystem.