muvera-rs

Crates.io	muvera-rs
lib.rs	muvera-rs
version	0.2.0
created_at	2025-07-05 13:51:55.044034+00
updated_at	2025-07-12 14:13:26.37828+00
description	An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
homepage	https://github.com/NewBornRustacean/muvera-rs
repository	https://github.com/NewBornRustacean/muvera-rs
max_upload_size
id	1739111
size	129,599

(NewBornRustacean)

documentation

https://docs.rs/muvera-rs

README

muvera-rs

An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.

Disclaimer

This is an unofficial implementation and is not affiliated with Google Research or the original authors.
Some parts of this README were generated by AI.

Overview

MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.

The Algorithm

Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.

MuVERA solves this by:

Fixed Dimensional Encoding (FDE): Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
Asymmetric Encoding: Uses sum aggregation for queries and average aggregation for documents
Single-Vector MIPS: Enables the use of highly-optimized Maximum Inner Product Search algorithms
Theoretical Guarantees: Provides ε-approximation guarantees for Chamfer similarity

The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.

Features

✅ Core FDE Implementation: Complete Fixed Dimensional Encoding with configurable buckets
✅ Batch Processing: Efficient parallel encoding for large datasets
✅ Type Safety: Generic implementation supporting f32 and f64
✅ Memory Efficient: Uses ndarray for optimized vector operations
✅ Deterministic: Reproducible results with seed-based randomization
✅ Comprehensive Testing: Extensive test suite covering edge cases

Installation

Rust

Add to your Cargo.toml:

[dependencies]
muvera-rs = "0.1.0"

Python

Install from PyPI:

pip install muvera

Or install from source:

pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings

Testing the Python Bindings

After building with maturin, you can test the Python bindings interactively:

import numpy as np
import muvera

embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)

Or save the above as a script and run with python my_test.py.

Usage

Python Example

import numpy as np
import muvera

# Create token embeddings (num_tokens, embedding_dim)
embeddings = np.random.randn(32, 768).astype(np.float32)

# Encode with mean aggregation
buckets = 3
result = muvera.encode_fde(embeddings, buckets, "mean")
print(f"FDE result shape: {result.shape}")

# Encode with max aggregation
result_max = muvera.encode_fde(embeddings, buckets, "max")
print(f"FDE max result shape: {result_max.shape}")

Rust Example

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};

fn main() {
    // Create encoder with 1024 buckets for 768-dimensional embeddings
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Example token embeddings (num_tokens, embedding_dim)
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Encode query (sum aggregation)
    let query_fde = encoder.encode_query(tokens.view());
    
    // Encode document (average aggregation)
    let doc_fde = encoder.encode_doc(tokens.view());
}

Batch Processing

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};

fn encode_batch() {
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Batch of token embeddings (batch_size, num_tokens, embedding_dim)
    let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
    
    // Encode entire batch
    let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
    
    println!("Encoded batch shape: {:?}", batch_fdes.shape());
}

Custom Aggregation

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;

fn custom_encoding() {
    let encoder = FDEEncoder::new(128, 768, 42);
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Use custom aggregation mode
    let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
    let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}

API Reference

`FDEEncoder<T>`

The main encoder struct that implements the Fixed Dimensional Encoding algorithm.

Constructor

pub fn new(buckets: usize, dim: usize, seed: u64) -> Self

buckets: Number of hash buckets (hyperplanes)
dim: Dimensionality of input token embeddings
seed: Random seed for reproducible hyperplane initialization

Methods

encode(tokens, mode): Encode single multi-vector
batch_encode(tokens, mode): Encode batch of multi-vectors (parallel)
encode_query(tokens): Encode query with sum aggregation
encode_doc(tokens): Encode document with average aggregation
encode_query_batch(tokens): Batch encode queries
encode_doc_batch(tokens): Batch encode documents

`Aggregation`

Enum defining aggregation modes:

Sum: Sum all tokens in each bucket
Avg: Average all tokens in each bucket

Performance Considerations

Buckets: More buckets = better approximation but higher memory usage
Batch Size: Use batch encoding for multiple vectors to leverage parallelism
Memory: FDE vectors are buckets * dim in size
Precision: f32 is typically sufficient and faster than f64

References

Original Paper

MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni
arXiv:2405.19504

Google Research Blog

MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search
Google Research Blog

Original Implementation

Google Graph Mining Repository
github.com/google/graph-mining/tree/main/sketching/point_cloud

Future Work

Planned Features

Benchmark Suite
- Integration with BEIR datasets (MS MARCO, etc.)
- Memory usage and compression analysis
Advanced Features
- Support BLAS
- Product Quantization (PQ): 32x compression with minimal quality loss
- Final Projections: Dimensionality reduction techniques

License

This project is licensed under the MIT License - see the LICENSE file for details.

Commit count: 0

muvera-rs

documentation

README

muvera-rs

Disclaimer

Overview

The Algorithm

Features

Installation

Rust

Python

Testing the Python Bindings

Usage

Python Example

Rust Example

Batch Processing

Custom Aggregation

API Reference

FDEEncoder<T>

Constructor

Methods

Aggregation

Performance Considerations

References

Original Paper

Google Research Blog

Original Implementation

Future Work

Planned Features

License

cargo fmt

`FDEEncoder<T>`

`Aggregation`