muvera-rs

Crates.iomuvera-rs
lib.rsmuvera-rs
version0.2.0
created_at2025-07-05 13:51:55.044034+00
updated_at2025-07-12 14:13:26.37828+00
descriptionAn unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
homepagehttps://github.com/NewBornRustacean/muvera-rs
repositoryhttps://github.com/NewBornRustacean/muvera-rs
max_upload_size
id1739111
size129,599
(NewBornRustacean)

documentation

https://docs.rs/muvera-rs

README

muvera-rs

An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.

Crates.io Documentation License

Disclaimer

  • This is an unofficial implementation and is not affiliated with Google Research or the original authors.
  • Some parts of this README were generated by AI.

Overview

MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.

The Algorithm

Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.

MuVERA solves this by:

  1. Fixed Dimensional Encoding (FDE): Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
  2. Asymmetric Encoding: Uses sum aggregation for queries and average aggregation for documents
  3. Single-Vector MIPS: Enables the use of highly-optimized Maximum Inner Product Search algorithms
  4. Theoretical Guarantees: Provides ε-approximation guarantees for Chamfer similarity

The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.

Features

  • Core FDE Implementation: Complete Fixed Dimensional Encoding with configurable buckets
  • Batch Processing: Efficient parallel encoding for large datasets
  • Type Safety: Generic implementation supporting f32 and f64
  • Memory Efficient: Uses ndarray for optimized vector operations
  • Deterministic: Reproducible results with seed-based randomization
  • Comprehensive Testing: Extensive test suite covering edge cases

Installation

Rust

Add to your Cargo.toml:

[dependencies]
muvera-rs = "0.1.0"

Python

Install from PyPI:

pip install muvera

Or install from source:

pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings

Testing the Python Bindings

After building with maturin, you can test the Python bindings interactively:

import numpy as np
import muvera

embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)

Or save the above as a script and run with python my_test.py.

Usage

Python Example

import numpy as np
import muvera

# Create token embeddings (num_tokens, embedding_dim)
embeddings = np.random.randn(32, 768).astype(np.float32)

# Encode with mean aggregation
buckets = 3
result = muvera.encode_fde(embeddings, buckets, "mean")
print(f"FDE result shape: {result.shape}")

# Encode with max aggregation
result_max = muvera.encode_fde(embeddings, buckets, "max")
print(f"FDE max result shape: {result_max.shape}")

Rust Example

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};

fn main() {
    // Create encoder with 1024 buckets for 768-dimensional embeddings
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Example token embeddings (num_tokens, embedding_dim)
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Encode query (sum aggregation)
    let query_fde = encoder.encode_query(tokens.view());
    
    // Encode document (average aggregation)
    let doc_fde = encoder.encode_doc(tokens.view());
}

Batch Processing

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};

fn encode_batch() {
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Batch of token embeddings (batch_size, num_tokens, embedding_dim)
    let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
    
    // Encode entire batch
    let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
    
    println!("Encoded batch shape: {:?}", batch_fdes.shape());
}

Custom Aggregation

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;

fn custom_encoding() {
    let encoder = FDEEncoder::new(128, 768, 42);
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Use custom aggregation mode
    let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
    let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}

API Reference

FDEEncoder<T>

The main encoder struct that implements the Fixed Dimensional Encoding algorithm.

Constructor

pub fn new(buckets: usize, dim: usize, seed: u64) -> Self
  • buckets: Number of hash buckets (hyperplanes)
  • dim: Dimensionality of input token embeddings
  • seed: Random seed for reproducible hyperplane initialization

Methods

  • encode(tokens, mode): Encode single multi-vector
  • batch_encode(tokens, mode): Encode batch of multi-vectors (parallel)
  • encode_query(tokens): Encode query with sum aggregation
  • encode_doc(tokens): Encode document with average aggregation
  • encode_query_batch(tokens): Batch encode queries
  • encode_doc_batch(tokens): Batch encode documents

Aggregation

Enum defining aggregation modes:

  • Sum: Sum all tokens in each bucket
  • Avg: Average all tokens in each bucket

Performance Considerations

  • Buckets: More buckets = better approximation but higher memory usage
  • Batch Size: Use batch encoding for multiple vectors to leverage parallelism
  • Memory: FDE vectors are buckets * dim in size
  • Precision: f32 is typically sufficient and faster than f64

References

Original Paper

  • MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
    Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni
    arXiv:2405.19504

Google Research Blog

Original Implementation

Future Work

Planned Features

  1. Benchmark Suite

    • Integration with BEIR datasets (MS MARCO, etc.)
    • Memory usage and compression analysis
  2. Advanced Features

    • Support BLAS

    • Product Quantization (PQ): 32x compression with minimal quality loss

    • Final Projections: Dimensionality reduction techniques

License

This project is licensed under the MIT License - see the LICENSE file for details.

Commit count: 0

cargo fmt