| Crates.io | muvera-rs |
| lib.rs | muvera-rs |
| version | 0.2.0 |
| created_at | 2025-07-05 13:51:55.044034+00 |
| updated_at | 2025-07-12 14:13:26.37828+00 |
| description | An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings |
| homepage | https://github.com/NewBornRustacean/muvera-rs |
| repository | https://github.com/NewBornRustacean/muvera-rs |
| max_upload_size | |
| id | 1739111 |
| size | 129,599 |
An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.
MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.
Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.
MuVERA solves this by:
The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.
f32 and f64ndarray for optimized vector operationsAdd to your Cargo.toml:
[dependencies]
muvera-rs = "0.1.0"
Install from PyPI:
pip install muvera
Or install from source:
pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings
After building with maturin, you can test the Python bindings interactively:
import numpy as np
import muvera
embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)
Or save the above as a script and run with python my_test.py.
import numpy as np
import muvera
# Create token embeddings (num_tokens, embedding_dim)
embeddings = np.random.randn(32, 768).astype(np.float32)
# Encode with mean aggregation
buckets = 3
result = muvera.encode_fde(embeddings, buckets, "mean")
print(f"FDE result shape: {result.shape}")
# Encode with max aggregation
result_max = muvera.encode_fde(embeddings, buckets, "max")
print(f"FDE max result shape: {result_max.shape}")
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};
fn main() {
// Create encoder with 1024 buckets for 768-dimensional embeddings
let encoder = FDEEncoder::new(128, 768, 42);
// Example token embeddings (num_tokens, embedding_dim)
let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
// Encode query (sum aggregation)
let query_fde = encoder.encode_query(tokens.view());
// Encode document (average aggregation)
let doc_fde = encoder.encode_doc(tokens.view());
}
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};
fn encode_batch() {
let encoder = FDEEncoder::new(128, 768, 42);
// Batch of token embeddings (batch_size, num_tokens, embedding_dim)
let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
// Encode entire batch
let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
println!("Encoded batch shape: {:?}", batch_fdes.shape());
}
use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
fn custom_encoding() {
let encoder = FDEEncoder::new(128, 768, 42);
let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
// Use custom aggregation mode
let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}
FDEEncoder<T>The main encoder struct that implements the Fixed Dimensional Encoding algorithm.
pub fn new(buckets: usize, dim: usize, seed: u64) -> Self
buckets: Number of hash buckets (hyperplanes)dim: Dimensionality of input token embeddingsseed: Random seed for reproducible hyperplane initializationencode(tokens, mode): Encode single multi-vectorbatch_encode(tokens, mode): Encode batch of multi-vectors (parallel)encode_query(tokens): Encode query with sum aggregationencode_doc(tokens): Encode document with average aggregationencode_query_batch(tokens): Batch encode queriesencode_doc_batch(tokens): Batch encode documentsAggregationEnum defining aggregation modes:
Sum: Sum all tokens in each bucketAvg: Average all tokens in each bucketbuckets * dim in sizef32 is typically sufficient and faster than f64Benchmark Suite
Advanced Features
Support BLAS
Product Quantization (PQ): 32x compression with minimal quality loss
Final Projections: Dimensionality reduction techniques
This project is licensed under the MIT License - see the LICENSE file for details.