diskann-vector

Crates.iodiskann-vector
lib.rsdiskann-vector
version0.1.0
created_at2025-08-03 07:16:58.810087+00
updated_at2025-08-03 07:16:58.810087+00
descriptionVector operations and distance metrics for DiskANN, supporting f32, f16, and various distance functions
homepage
repositoryhttps://github.com/infinilabs/diskann
max_upload_size
id1779429
size48,204
Medcl (medcl)

documentation

README

Vector Library

A high-performance vector operations library providing distance metrics, data types, and utilities for vector computations in Rust.

Overview

The Vector library provides essential functionality for vector operations, including distance metrics, data types, and utilities used by the DiskANN project. It's designed for high-performance vector computations with support for different numeric types and distance functions.

Features

  • Multiple distance metrics - L2, cosine, inner product, and more
  • Flexible data types - Support for f32, f16, and other numeric types
  • High performance - Optimized vector operations
  • Type safety - Strong typing for vector dimensions
  • SIMD support - Vectorized operations where available
  • Cross-platform - Works on multiple architectures

Quick Start

use vector::{Metric, FullPrecisionDistance};

// Define a vector type with specific dimension
type Vector128 = [f32; 128];

// Create vectors
let v1: Vector128 = [1.0; 128];
let v2: Vector128 = [2.0; 128];

// Calculate distance using different metrics
let l2_distance = v1.l2_distance(&v2);
let cosine_distance = v1.cosine_distance(&v2);
let inner_product = v1.inner_product(&v2);

println!("L2 distance: {}", l2_distance);
println!("Cosine distance: {}", cosine_distance);
println!("Inner product: {}", inner_product);

Distance Metrics

L2 Distance (Euclidean)

use vector::Metric;

let v1 = [1.0, 2.0, 3.0];
let v2 = [4.0, 5.0, 6.0];

let distance = v1.l2_distance(&v2);
// Calculates: sqrt((4-1)² + (5-2)² + (6-3)²)

Cosine Distance

let v1 = [1.0, 2.0, 3.0];
let v2 = [4.0, 5.0, 6.0];

let distance = v1.cosine_distance(&v2);
// Calculates: 1 - (v1·v2) / (||v1|| * ||v2||)

Inner Product

let v1 = [1.0, 2.0, 3.0];
let v2 = [4.0, 5.0, 6.0];

let similarity = v1.inner_product(&v2);
// Calculates: v1·v2 = Σ(v1[i] * v2[i])

Data Types

Supported Types

  • f32 - 32-bit floating point (most common)
  • f16 - 16-bit floating point (memory efficient)
  • f64 - 64-bit floating point (high precision)

Type Conversion

use vector::Half;

// Convert between types
let f32_vector: [f32; 128] = [1.0; 128];
let f16_vector: [Half; 128] = f32_vector.map(|x| Half::from_f32(x));

// Convert back
let back_to_f32: [f32; 128] = f16_vector.map(|x| x.to_f32());

Dimension Support

The library supports fixed-size arrays for different dimensions:

// Common dimensions
type Vector64 = [f32; 64];
type Vector128 = [f32; 128];
type Vector256 = [f32; 256];
type Vector512 = [f32; 512];

// Custom dimensions
type CustomVector = [f32; 1024];

Performance Optimizations

SIMD Operations

The library automatically uses SIMD instructions when available:

// These operations are automatically vectorized
let v1: [f32; 128] = [1.0; 128];
let v2: [f32; 128] = [2.0; 128];

let distance = v1.l2_distance(&v2); // Uses SIMD if available

Memory Alignment

For optimal performance, ensure vectors are properly aligned:

use std::alloc::{alloc, Layout};

// Allocate aligned memory
let layout = Layout::from_size_align(1024, 32).unwrap();
let ptr = unsafe { alloc(layout) };

Advanced Usage

Custom Distance Metrics

use vector::{FullPrecisionDistance, Metric};

// Implement custom distance for your type
impl FullPrecisionDistance<f32, 128> for [f32; 128] {
    fn l2_distance(&self, other: &[f32; 128]) -> f32 {
        self.iter()
            .zip(other.iter())
            .map(|(a, b)| (a - b).powi(2))
            .sum::<f32>()
            .sqrt()
    }
}

Batch Operations

use rayon::prelude::*;

let vectors: Vec<[f32; 128]> = vec![/* your vectors */];
let query: [f32; 128] = [/* query vector */];

// Parallel distance calculation
let distances: Vec<f32> = vectors
    .par_iter()
    .map(|v| v.l2_distance(&query))
    .collect();

Integration with DiskANN

The Vector library is designed to work seamlessly with DiskANN:

use diskann::{IndexBuilder, Metric};
use vector::FullPrecisionDistance;

// Create index with vector types
let mut index = IndexBuilder::new()
    .with_dimension(128)
    .with_metric(Metric::L2)
    .build_in_memory::<f32>()?;

// Insert vectors
let vectors: Vec<[f32; 128]> = vec![/* your vectors */];
index.insert_batch(&vectors)?;

Benchmarks

Performance comparison of different distance metrics (Intel i7-8700K):

Metric 128-dim 256-dim 512-dim 1024-dim
L2 0.8μs 1.2μs 2.1μs 4.3μs
Cosine 1.1μs 1.8μs 3.2μs 6.1μs
Inner Product 0.6μs 1.0μs 1.8μs 3.5μs

Times are per vector pair comparison

Development

Building

cargo build --release

Testing

cargo test
cargo test --benches

Benchmarks

cargo bench

API Reference

Core Traits

  • FullPrecisionDistance<T, DIM> - Distance calculation trait
  • Metric - Distance metric enumeration
  • Half - 16-bit floating point type

Main Functions

  • l2_distance() - Calculate L2 distance
  • cosine_distance() - Calculate cosine distance
  • inner_product() - Calculate inner product
  • normalize() - Normalize vector to unit length

Utility Functions

  • round_up() - Round up to nearest multiple
  • is_floating_point() - Check if type is floating point
  • get_distance_function() - Get distance function for metric

Dependencies

  • rayon - Parallel processing
  • half - 16-bit floating point support
  • bytemuck - Memory operations
  • serde - Serialization (optional)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

We welcome contributions! Please see the main README for contribution guidelines.

Commit count: 0

cargo fmt