Crates.io | metrovector |
lib.rs | metrovector |
version | 0.1.0 |
created_at | 2025-06-11 14:42:10.549004+00 |
updated_at | 2025-06-11 14:42:10.549004+00 |
description | A high-performance, compact binary format for storing and querying vector embeddings. |
homepage | |
repository | https://github.com/thegenem0/metrovector |
max_upload_size | |
id | 1708650 |
size | 196,115 |
A high-performance, compact binary format for storing and querying vector embeddings, designed as a foundational building block for vector databases.
MVF (Metro Vector Format) is a binary file format optimized for storing large collections of high-dimensional vectors with associated metadata. It provides memory-efficient storage, fast random access, and support for various vector data types commonly used in machine learning and AI applications.
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â MVF File Structure â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââĪ
â Magic Header (4B) â Vector Data Blocks â Footer â Magic (4B)â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â
âž
âââââââââââââââââââââââââââââââââââ
â Vector Data Block â
âââââââââââââââââââââââââââââââââââĪ
â Vector 0: [1.0, 2.0, 3.0, ...] â
â Vector 1: [4.0, 5.0, 6.0, ...] â
â Vector 2: [7.0, 8.0, 9.0, ...] â
â ... â
âââââââââââââââââââââââââââââââââââ
Add this to your Cargo.toml
:
[dependencies]
metrovector-format = "0.1.0"
use metrovector_format::{
builder::MvfBuilder,
mvf_fbs::{DataType, DistanceMetric, VectorType},
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create some sample vectors
let vectors = vec![
vec![1.0, 2.0, 3.0, 4.0],
vec![5.0, 6.0, 7.0, 8.0],
vec![9.0, 10.0, 11.0, 12.0],
];
// Build the MVF file
let mut builder = MvfBuilder::new();
builder.add_vector_space(
"embeddings", // Space name
4, // Dimensions
VectorType::Dense, // Vector type
DistanceMetric::Euclidean, // Distance metric
DataType::Float32, // Data type
);
builder.add_vectors("embeddings", &vectors)?;
let built_mvf = builder.build();
// Write to file
built_mvf.save("vectors.mvf")?;
Ok(())
}
use metrovector_format::MvfReader;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Open the MVF file
let mvf_file = MvfFile::open("vectors.mvf")?;
println!("File version: {}", mvf_file.version());
println!("Vector spaces: {}", mvf_file.vector_spaces().len());
// Get the first vector space by name
let space = mvf_file.vector_space(mvf_file.vector_space_names().first().unwrap())?;
println!("Space '{}': {} vectors, {} dimensions",
space.name(), space.total_vectors(), space.dimension());
// Read vectors
for i in 0..space.total_vectors() {
let vector = space.get_vector(i)?;
let data = vector.as_f32()?;
println!("Vector {}: {:?}", i, data);
}
Ok(())
}
The examples/ directory contains comprehensive examples:
# Basic usage
cargo run --example simple
# Working with different data types
cargo run --example data_types
# Performance benchmarking with large datasets
# Note: Generating all of the data takes as much ram as the dataset size
cargo run --example large_dataset -- --size 4gb
# Similarity search
cargo run --example similarity_search
Performance characteristics on modern hardware:
Operation | Throughput (vectors/sec) | Latency |
---|---|---|
Sequential Read | ~1.5M | ~0.5Ξs per vector |
Random Access | ~500K | ~2Ξs per vector |
File Opening | - | ~10ms (any size) |
Memory Usage | ~0 (memory mapped) | - |
This is a high-level overview of the API.
L2: Euclidean distance
Cosine: Cosine similarity
Dot: Dot product
Run tests with:
cargo nextest run
Get test coverage with:
cargo llvm-cov nextest --html
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the MIT license.
Built with âĪïļ in Rust for the vector AI community.