| Crates.io | diskann |
| lib.rs | diskann |
| version | 0.1.0 |
| created_at | 2025-08-03 07:23:13.887316+00 |
| updated_at | 2025-08-03 07:23:13.887316+00 |
| description | High-performance approximate nearest neighbor (ANN) search library implemented in Rust |
| homepage | |
| repository | https://github.com/infinilabs/diskann |
| max_upload_size | |
| id | 1779436 |
| size | 14,388,554 |
The core DiskANN library providing high-performance approximate nearest neighbor (ANN) search functionality in Rust.
DiskANN is a high-performance, scalable approximate nearest neighbor (ANN) search library implemented in Rust. It provides both in-memory and disk-based indexing for large-scale vector search with high recall and low latency.
use diskann::{IndexBuilder, Metric, ANNResult};
fn main() -> ANNResult<()> {
// Create an in-memory index
let mut index = IndexBuilder::new()
.with_dimension(128) // Vector dimension
.with_metric(Metric::L2) // Distance metric
.with_max_degree(64) // Maximum graph degree
.with_search_list_size(100) // Search beam width
.with_alpha(1.2) // Graph density parameter
.with_num_threads(4) // Number of threads
.build_in_memory::<f32>()?;
// Insert vectors
let vectors = vec![
vec![1.0, 2.0, 3.0, 4.0],
vec![2.0, 3.0, 4.0, 5.0],
vec![3.0, 4.0, 5.0, 6.0],
];
index.insert_batch(&vectors)?;
// Build the index
index.build(&vectors)?;
// Search for nearest neighbors
let query = vec![1.0, 2.0, 3.0, 4.0];
let results = index.search(&query, 5, 50)?;
println!("Found {} nearest neighbors", results.len());
for result in results {
println!("ID: {}, Distance: {:.4}", result.id, result.distance);
}
Ok(())
}
The main entry point for creating indices with a fluent builder pattern:
let index = IndexBuilder::new()
.with_dimension(128) // Required: vector dimension
.with_metric(Metric::L2) // Required: distance metric
.with_max_degree(64) // Optional: max graph degree (default: 64)
.with_search_list_size(100) // Optional: search beam width (default: 100)
.with_alpha(1.2) // Optional: graph density (default: 1.2)
.with_num_threads(4) // Optional: thread count (default: 1)
.with_opq(false) // Optional: enable OPQ compression (default: false)
.build_in_memory::<f32>()?; // Build in-memory index
Available distance metrics:
Metric::L2 - Euclidean distance (fastest, most common)Metric::Cosine - Cosine distance (for normalized vectors)Metric::InnerProduct - Inner product similarityFine-tune search behavior:
let results = index.search(&query, k, l)?;
// k: number of results to return
// l: search beam width (higher = more accurate, slower)
use diskann::{SearchParams, ANNResult};
let params = SearchParams {
k: 10, // Number of results
l: 50, // Search beam width
return_distances: true, // Include distances in results
};
let results = index.search_with_params(&query, ¶ms)?;
Best for:
let mut index = IndexBuilder::new()
.with_dimension(128)
.with_metric(Metric::L2)
.build_in_memory::<f32>()?;
Best for:
let mut index = IndexBuilder::new()
.with_dimension(128)
.with_metric(Metric::L2)
.build_disk_index::<f32>("index_path")?;
// Load image embeddings
let embeddings = load_image_embeddings("images.bin")?;
let mut index = IndexBuilder::new()
.with_dimension(512) // Image embedding dimension
.with_metric(Metric::Cosine) // Cosine for normalized embeddings
.build_in_memory::<f32>()?;
index.insert_batch(&embeddings)?;
index.build(&embeddings)?;
// Find similar images
let query_embedding = extract_image_embedding("query.jpg")?;
let similar_images = index.search(&query_embedding, 10, 50)?;
// Load text embeddings
let text_embeddings = load_text_embeddings("documents.bin")?;
let mut index = IndexBuilder::new()
.with_dimension(768) // BERT embedding dimension
.with_metric(Metric::L2)
.build_in_memory::<f32>()?;
index.insert_batch(&text_embeddings)?;
index.build(&text_embeddings)?;
// Semantic search
let query_embedding = embed_text("search query")?;
let relevant_docs = index.search(&query_embedding, 20, 100)?;
// For very large datasets, use disk-based index
let mut index = IndexBuilder::new()
.with_dimension(128)
.with_metric(Metric::InnerProduct) // For recommendation scores
.with_max_degree(128)
.with_search_list_size(200)
.with_num_threads(16)
.build_disk_index::<f32>("recommendations")?;
// Build from file
index.build_from_file("user_embeddings.bin")?;
// Get recommendations
let user_embedding = get_user_embedding(user_id)?;
let recommendations = index.search(&user_embedding, 50, 200)?;
Performance on typical hardware (Intel i7-8700K, 32GB RAM):
| Dataset Size | Index Type | Build Time | Search Time (ms) | Memory Usage |
|---|---|---|---|---|
| 100K vectors | In-Memory | 2.3s | 0.8 | 512MB |
| 1M vectors | In-Memory | 18.7s | 1.2 | 4.2GB |
| 10M vectors | Disk-Based | 3m 45s | 2.1 | 1.1GB |
| 100M vectors | Disk-Based | 42m 12s | 3.8 | 2.3GB |
Search times are for k=10, l=50 on 128-dimensional vectors
cargo build --release
cargo test
cargo test --examples
cargo bench
This project is licensed under the MIT License - see the LICENSE file for details.
We welcome contributions! Please see the main README for contribution guidelines.