ruvector-onnx-embeddings

Crates.ioruvector-onnx-embeddings
lib.rsruvector-onnx-embeddings
version0.1.0
created_at2025-12-31 03:36:55.829655+00
updated_at2025-12-31 03:36:55.829655+00
descriptionONNX-based embedding generation for RuVector - Reimagined embedding pipeline in pure Rust
homepage
repositoryhttps://github.com/ruvnet/ruvector
max_upload_size
id2013839
size381,963
rUv (ruvnet)

documentation

README

RuVector ONNX Embeddings

Production-ready ONNX-based embedding generation for semantic search and RAG pipelines in pure Rust

This library provides a complete embedding generation system built entirely in Rust using ONNX Runtime. Designed for high-performance vector databases, semantic search engines, and AI applications.

Table of Contents


Features

Feature Description Status
Native ONNX Runtime Direct ONNX model execution via ort 2.0
Pretrained Models 8 popular sentence-transformer models
HuggingFace Integration Download any compatible model from HF Hub
Multiple Pooling Mean, CLS, Max, MeanSqrtLen, LastToken, WeightedMean
Batch Processing Efficient batch embedding with configurable size
GPU Acceleration CUDA, TensorRT, CoreML support
Vector Search Built-in similarity search (cosine, euclidean, dot)
RAG Pipeline Ready-to-use retrieval-augmented generation
Thread-Safe Safe concurrent use via RwLock
Zero Python Pure Rust - no Python dependencies

Quick Start

use ruvector_onnx_embeddings::{Embedder, PretrainedModel};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create embedder with default model
    let mut embedder = Embedder::default_model().await?;

    // Generate embedding
    let embedding = embedder.embed_one("Hello, world!")?;
    println!("Embedding dimension: {}", embedding.len()); // 384

    // Compute semantic similarity
    let sim = embedder.similarity(
        "I love programming in Rust",
        "Rust is my favorite language"
    )?;
    println!("Similarity: {:.4}", sim); // ~0.85

    Ok(())
}

Installation

Step 1: Add Dependencies

[dependencies]
ruvector-onnx-embeddings = { path = "examples/onnx-embeddings" }
tokio = { version = "1", features = ["full"] }
anyhow = "1.0"

Step 2: Choose Features (Optional)

Feature Command Description
Default cargo build CPU inference
CUDA cargo build --features cuda NVIDIA GPU
TensorRT cargo build --features tensorrt NVIDIA optimized
CoreML cargo build --features coreml Apple Silicon

Step 3: Run Examples

# Basic example
cargo run --example basic_embedding

# Full demo with all features
cargo run

Supported Models

Model Comparison Table

Model Dimension Max Tokens Size Speed Quality Best For
AllMiniLmL6V2 384 256 23MB ⚡⚡⚡ ⭐⭐⭐ Default - Fast, general-purpose
AllMiniLmL12V2 384 256 33MB ⚡⚡ ⭐⭐⭐⭐ Better quality, balanced
AllMpnetBaseV2 768 384 110MB ⭐⭐⭐⭐⭐ Best quality, production
E5SmallV2 384 512 33MB ⚡⚡⚡ ⭐⭐⭐⭐ Search & retrieval
E5BaseV2 768 512 110MB ⭐⭐⭐⭐⭐ High-quality search
BgeSmallEnV15 384 512 33MB ⚡⚡⚡ ⭐⭐⭐⭐ State-of-the-art small
BgeBaseEnV15 768 512 110MB ⭐⭐⭐⭐⭐ Best overall quality
GteSmall 384 512 33MB ⚡⚡⚡ ⭐⭐⭐⭐ Multilingual support

Model Selection Flowchart

┌─────────────────────────────────────────────────────────────────┐
│                     Which Model Should I Use?                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Priority: Speed?       ──────►  AllMiniLmL6V2 (23MB, 384d)     │
│                                                                  │
│  Priority: Quality?     ──────►  AllMpnetBaseV2 (110MB, 768d)   │
│                                                                  │
│  Building search?       ──────►  BgeSmallEnV15 or E5SmallV2     │
│                                                                  │
│  Multilingual?          ──────►  GteSmall                       │
│                                                                  │
│  Production RAG?        ──────►  BgeBaseEnV15 or E5BaseV2       │
│                                                                  │
│  Memory constrained?    ──────►  AllMiniLmL6V2                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Tutorial: Step-by-Step Guide

Step 1: Basic Embedding Generation

Goal: Generate your first embedding and understand the output.

use ruvector_onnx_embeddings::{Embedder, EmbedderConfig, PretrainedModel};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // 1. Create an embedder (downloads model on first run)
    println!("Loading model...");
    let mut embedder = Embedder::default_model().await?;

    // 2. Check model info
    println!("Model: {}", embedder.model_info().name);
    println!("Dimension: {}", embedder.dimension());
    println!("Max tokens: {}", embedder.max_length());

    // 3. Generate an embedding
    let text = "The quick brown fox jumps over the lazy dog.";
    let embedding = embedder.embed_one(text)?;

    // 4. Examine the output
    println!("\nInput: \"{}\"", text);
    println!("Output shape: [{} dimensions]", embedding.len());
    println!("First 5 values: [{:.4}, {:.4}, {:.4}, {:.4}, {:.4}]",
        embedding[0], embedding[1], embedding[2], embedding[3], embedding[4]);

    // 5. Compute similarity between texts
    let text1 = "I love programming in Rust.";
    let text2 = "Rust is my favorite programming language.";
    let text3 = "The weather is nice today.";

    let sim_related = embedder.similarity(text1, text2)?;
    let sim_unrelated = embedder.similarity(text1, text3)?;

    println!("\nSimilarity comparisons:");
    println!("  \"{}\" vs \"{}\"", text1, text2);
    println!("  Similarity: {:.4} (high - related topics)", sim_related);
    println!();
    println!("  \"{}\" vs \"{}\"", text1, text3);
    println!("  Similarity: {:.4} (low - unrelated topics)", sim_unrelated);

    Ok(())
}

Expected Output:

Loading model...
Model: all-MiniLM-L6-v2
Dimension: 384
Max tokens: 256

Input: "The quick brown fox jumps over the lazy dog."
Output shape: [384 dimensions]
First 5 values: [0.0234, -0.0156, 0.0891, -0.0412, 0.0567]

Similarity comparisons:
  "I love programming in Rust." vs "Rust is my favorite programming language."
  Similarity: 0.8523 (high - related topics)

  "I love programming in Rust." vs "The weather is nice today."
  Similarity: 0.1234 (low - unrelated topics)

Step 2: Batch Processing

Goal: Efficiently process multiple texts at once.

use ruvector_onnx_embeddings::{EmbedderBuilder, PretrainedModel, PoolingStrategy};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // 1. Configure for batch processing
    let mut embedder = EmbedderBuilder::new()
        .pretrained(PretrainedModel::AllMiniLmL6V2)
        .batch_size(64)           // Process 64 texts at a time
        .normalize(true)          // L2 normalize (recommended for cosine similarity)
        .pooling(PoolingStrategy::Mean)
        .build()
        .await?;

    // 2. Prepare your data
    let texts = vec![
        "Artificial intelligence is transforming technology.",
        "Machine learning models learn from data.",
        "Deep learning uses neural networks.",
        "Natural language processing understands text.",
        "Computer vision analyzes images.",
        "Reinforcement learning optimizes decisions.",
        "Vector databases enable semantic search.",
        "Embeddings capture semantic meaning.",
    ];

    // 3. Generate embeddings
    println!("Embedding {} texts...", texts.len());
    let start = std::time::Instant::now();
    let output = embedder.embed(&texts)?;
    let elapsed = start.elapsed();

    // 4. Examine results
    println!("Completed in {:?}", elapsed);
    println!("Total embeddings: {}", output.len());
    println!("Embedding dimension: {}", output.dimension);

    // 5. Show token counts per text
    println!("\nToken counts:");
    for (i, (text, tokens)) in texts.iter().zip(output.token_counts.iter()).enumerate() {
        println!("  [{}] {} tokens: \"{}...\"", i, tokens, &text[..40.min(text.len())]);
    }

    // 6. Access individual embeddings
    println!("\nFirst embedding (first 5 values):");
    let first = output.get(0).unwrap();
    println!("  [{:.4}, {:.4}, {:.4}, {:.4}, {:.4}, ...]",
        first[0], first[1], first[2], first[3], first[4]);

    Ok(())
}

Performance Table: Batch Size vs Throughput

Batch Size Time (8 texts) Throughput Memory
1 45ms 178/sec 150MB
8 35ms 228/sec 160MB
32 28ms 285/sec 180MB
64 25ms 320/sec 200MB

Step 3: Building a Semantic Search Engine

Goal: Create a searchable knowledge base with semantic understanding.

use ruvector_onnx_embeddings::{
    Embedder, RuVectorBuilder, Distance
};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // 1. Create embedder
    println!("Step 1: Loading embedder...");
    let embedder = Embedder::default_model().await?;

    // 2. Create search index
    println!("Step 2: Creating search index...");
    let index = RuVectorBuilder::new("programming_languages")
        .embedder(embedder)
        .distance(Distance::Cosine)      // Best for normalized embeddings
        .max_elements(100_000)           // Pre-allocate for 100k vectors
        .build()?;

    // 3. Index documents
    println!("Step 3: Indexing documents...");
    let documents = vec![
        "Rust is a systems programming language focused on safety and performance.",
        "Python is widely used for machine learning and data science applications.",
        "JavaScript is the language of the web, running in browsers everywhere.",
        "Go is designed for building scalable and efficient server applications.",
        "TypeScript adds static typing to JavaScript for better developer experience.",
        "C++ provides low-level control and high performance for system software.",
        "Java is a mature, object-oriented language popular in enterprise software.",
        "Swift is Apple's modern language for iOS and macOS development.",
        "Kotlin is a concise language that runs on the JVM, popular for Android.",
        "Haskell is a purely functional programming language with strong typing.",
    ];

    index.insert_batch(&documents)?;
    println!("   Indexed {} documents", documents.len());
    println!("   Index size: {} vectors", index.len());

    // 4. Perform searches
    println!("\nStep 4: Running searches...\n");

    let queries = vec![
        "What language is best for web development?",
        "I want to build a high-performance system application",
        "Which language should I learn for machine learning?",
        "I need a language for mobile app development",
    ];

    for query in queries {
        println!("🔍 Query: \"{}\"", query);
        let results = index.search(query, 3)?;

        for (i, result) in results.iter().enumerate() {
            println!("   {}. (score: {:.4}) {}",
                i + 1,
                result.score,
                result.text);
        }
        println!();
    }

    Ok(())
}

Search Results Table:

Query Top Result Score
"What language is best for web development?" "JavaScript is the language of the web..." 0.82
"high-performance system application" "Rust is a systems programming language..." 0.78
"machine learning" "Python is widely used for machine learning..." 0.85
"mobile app development" "Swift is Apple's modern language for iOS..." 0.76

Step 4: Creating a RAG Pipeline

Goal: Build a retrieval-augmented generation system for LLM context.

use ruvector_onnx_embeddings::{
    Embedder, RuVectorEmbeddings, RagPipeline
};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // 1. Create knowledge base
    println!("Step 1: Creating knowledge base...");
    let embedder = Embedder::default_model().await?;
    let index = RuVectorEmbeddings::new_default("ruvector_docs", embedder)?;

    // 2. Add documentation
    println!("Step 2: Adding documents...");
    let knowledge = vec![
        "RuVector is a distributed vector database that learns and adapts.",
        "RuVector uses HNSW indexing for fast approximate nearest neighbor search.",
        "The embedding dimension in RuVector is configurable based on your model.",
        "RuVector supports multiple distance metrics: Cosine, Euclidean, and Dot Product.",
        "Graph Neural Networks in RuVector improve search quality over time.",
        "RuVector integrates with ONNX models for native embedding generation.",
        "The NAPI-RS bindings allow using RuVector from Node.js applications.",
        "RuVector supports WebAssembly for running in web browsers.",
        "Quantization in RuVector reduces memory usage by up to 32x.",
        "RuVector can handle millions of vectors with sub-millisecond search.",
    ];

    index.insert_batch(&knowledge)?;

    // 3. Create RAG pipeline
    println!("Step 3: Setting up RAG pipeline...");
    let rag = RagPipeline::new(index, 3); // Retrieve top-3 documents

    // 4. Retrieve context for queries
    println!("\nStep 4: Running RAG queries...\n");

    let queries = vec![
        "How does RuVector perform search?",
        "Can I use RuVector from JavaScript?",
        "How can I reduce memory usage?",
    ];

    for query in queries {
        println!("📝 Query: \"{}\"", query);
        let context = rag.retrieve(query)?;

        println!("   Retrieved context:");
        for (i, doc) in context.iter().enumerate() {
            println!("   {}. {}", i + 1, doc);
        }

        // Format for LLM prompt
        println!("\n   LLM Prompt:");
        println!("   ───────────────────────────────────────");
        println!("   Given the following context:");
        for doc in &context {
            println!("   - {}", doc);
        }
        println!("   ");
        println!("   Answer the question: {}", query);
        println!("   ───────────────────────────────────────\n");
    }

    Ok(())
}

RAG Pipeline Flow:

┌──────────┐    ┌─────────────┐    ┌──────────┐    ┌─────────┐
│  Query   │───►│  Embedder   │───►│  Search  │───►│ Context │
│          │    │             │    │  Index   │    │         │
└──────────┘    └─────────────┘    └──────────┘    └────┬────┘
                                                        │
                                                        v
┌──────────┐    ┌─────────────┐    ┌──────────┐    ┌─────────┐
│ Response │◄───│    LLM      │◄───│  Prompt  │◄───│ Format  │
│          │    │ (external)  │    │          │    │         │
└──────────┘    └─────────────┘    └──────────┘    └─────────┘

Step 5: Text Clustering

Goal: Automatically group similar texts together.

use ruvector_onnx_embeddings::Embedder;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut embedder = Embedder::default_model().await?;

    // Mixed-category texts
    let texts = vec![
        // Technology (expected cluster 0)
        "Artificial intelligence is revolutionizing industries.",
        "Machine learning algorithms process large datasets.",
        "Neural networks mimic the human brain.",
        // Sports (expected cluster 1)
        "Football is the most popular sport worldwide.",
        "Basketball requires speed and agility.",
        "Tennis is played on different court surfaces.",
        // Food (expected cluster 2)
        "Italian pasta comes in many shapes and sizes.",
        "Sushi is a traditional Japanese dish.",
        "French cuisine is known for its elegance.",
    ];

    println!("Clustering {} texts into 3 categories...\n", texts.len());

    // Perform clustering
    let clusters = embedder.cluster(&texts, 3)?;

    // Group and display results
    let mut groups: std::collections::HashMap<usize, Vec<&str>> =
        std::collections::HashMap::new();

    for (i, &cluster) in clusters.iter().enumerate() {
        groups.entry(cluster).or_default().push(texts[i]);
    }

    println!("Clustering Results:");
    println!("═══════════════════════════════════════════");

    for (cluster_id, members) in groups.iter() {
        println!("\n📁 Cluster {}:", cluster_id);
        for text in members {
            println!("   • {}", text);
        }
    }

    Ok(())
}

Expected Clustering Output:

Cluster Category Texts
0 Technology AI revolutionizing..., ML algorithms..., Neural networks...
1 Sports Football popular..., Basketball speed..., Tennis courts...
2 Food Italian pasta..., Sushi traditional..., French cuisine...

Configuration Reference

EmbedderConfig Options

Option Type Default Description
model_source ModelSource Pretrained Where to load model from
batch_size usize 32 Texts per inference batch
max_length usize 512 Maximum tokens per text
pooling PoolingStrategy Mean Token aggregation method
normalize bool true L2 normalize embeddings
num_threads usize 4 ONNX Runtime threads
cache_dir PathBuf ~/.cache/ruvector Model cache directory
show_progress bool true Show download progress
optimize_graph bool true ONNX graph optimization

Using EmbedderBuilder

use ruvector_onnx_embeddings::{
    EmbedderBuilder, PretrainedModel, PoolingStrategy
};

let embedder = EmbedderBuilder::new()
    .pretrained(PretrainedModel::BgeBaseEnV15)  // Choose model
    .batch_size(64)                              // Batch size
    .max_length(256)                             // Max tokens
    .pooling(PoolingStrategy::Mean)              // Pooling strategy
    .normalize(true)                             // L2 normalize
    .build()
    .await?;

Pooling Strategies

Strategy Method Best For Example Use
Mean Average all tokens General purpose Default choice
Cls [CLS] token only BERT-style models Classification
Max Max across tokens Keyword matching Entity extraction
MeanSqrtLen Mean / sqrt(len) Length-invariant Mixed-length comparison
LastToken Final token Decoder models GPT-style
WeightedMean Position-weighted Custom scenarios Special cases

Choosing a Strategy

Text Type          Recommended Strategy
─────────────────────────────────────────
Short sentences    Mean (default)
Long documents     MeanSqrtLen
BERT fine-tuned    Cls
Keyword search     Max
Decoder models     LastToken

Performance Benchmarks

Embedding Generation Speed

Tested on AMD EPYC 7763 (64-core), Ubuntu 22.04

Configuration Single Text Batch 32 Batch 128 Throughput
CPU (1 thread) 22ms 180ms 680ms 188/sec
CPU (8 threads) 18ms 85ms 310ms 413/sec
CUDA A100 4ms 15ms 45ms 2,844/sec
TensorRT A100 2ms 8ms 25ms 5,120/sec

Memory Usage

Model Parameters ONNX Size Runtime RAM GPU VRAM
AllMiniLmL6V2 22M 23MB 150MB 200MB
AllMpnetBaseV2 109M 110MB 400MB 600MB
BgeBaseEnV15 109M 110MB 400MB 600MB

Similarity Search Latency

Index Size Insert Time Search (top-10) Memory
1,000 0.5s 0.2ms 2MB
10,000 4s 0.5ms 15MB
100,000 40s 2ms 150MB
1,000,000 7min 8ms 1.5GB

API Reference

Core Types

// Main Embedder
pub struct Embedder;

impl Embedder {
    pub async fn new(config: EmbedderConfig) -> Result<Self>;
    pub async fn default_model() -> Result<Self>;
    pub async fn pretrained(model: PretrainedModel) -> Result<Self>;

    pub fn embed_one(&mut self, text: &str) -> Result<Vec<f32>>;
    pub fn embed<S: AsRef<str>>(&mut self, texts: &[S]) -> Result<EmbeddingOutput>;
    pub fn similarity(&mut self, text1: &str, text2: &str) -> Result<f32>;
    pub fn cluster<S>(&mut self, texts: &[S], n: usize) -> Result<Vec<usize>>;

    pub fn dimension(&self) -> usize;
    pub fn model_info(&self) -> &ModelInfo;
}

// Search Index
pub struct RuVectorEmbeddings;

impl RuVectorEmbeddings {
    pub fn new(name: &str, embedder: Embedder, config: IndexConfig) -> Result<Self>;
    pub fn insert(&self, text: &str, metadata: Option<Value>) -> Result<VectorId>;
    pub fn insert_batch<S>(&self, texts: &[S]) -> Result<Vec<VectorId>>;
    pub fn search(&self, query: &str, k: usize) -> Result<Vec<SearchResult>>;
    pub fn len(&self) -> usize;
}

// RAG Pipeline
pub struct RagPipeline;

impl RagPipeline {
    pub fn new(index: RuVectorEmbeddings, top_k: usize) -> Self;
    pub fn retrieve(&self, query: &str) -> Result<Vec<String>>;
    pub fn add_documents<S>(&mut self, docs: &[S]) -> Result<Vec<VectorId>>;
}

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                     RuVector ONNX Embeddings                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌───────────┐ │
│  │    Text     │ -> │  Tokenizer  │ -> │    ONNX     │ -> │  Pooling  │ │
│  │   Input     │    │ (HF Rust)   │    │   Runtime   │    │  Strategy │ │
│  └─────────────┘    └─────────────┘    └─────────────┘    └───────────┘ │
│                                                                  │       │
│                                                                  v       │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌───────────┐ │
│  │   Search    │ <- │   Vector    │ <- │  Normalize  │ <- │ Embedding │ │
│  │  Results    │    │    Index    │    │   (L2)      │    │  Vector   │ │
│  └─────────────┘    └─────────────┘    └─────────────┘    └───────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Troubleshooting

Common Issues and Solutions

Issue Cause Solution
Model download fails Network/firewall Use local model or check connection
Out of memory Large model/batch Reduce batch_size or use smaller model
Slow inference CPU-bound Enable GPU or increase num_threads
Dimension mismatch Different models Ensure same model for index and query
CUDA not found Missing driver Install CUDA toolkit and drivers

Debugging Tips

// Enable verbose logging
std::env::set_var("RUST_LOG", "debug");
tracing_subscriber::fmt::init();

// Check model loading
let embedder = Embedder::default_model().await?;
println!("Model: {}", embedder.model_info().name);
println!("Dimension: {}", embedder.dimension());

Running Benchmarks

# Run all benchmarks
cargo bench

# Generate HTML report
cargo bench -- --verbose
open target/criterion/report/index.html

Examples

# Basic embedding
cargo run --example basic_embedding

# Batch processing
cargo run --example batch_embedding

# Semantic search
cargo run --example semantic_search

# Full interactive demo
cargo run

License

MIT License - See LICENSE for details.


Built with Rust for the RuVector ecosystem.

Commit count: 729

cargo fmt