| Crates.io | swarc |
| lib.rs | swarc |
| version | 0.1.0 |
| created_at | 2025-09-30 21:29:27.187608+00 |
| updated_at | 2025-09-30 21:29:27.187608+00 |
| description | Small World Approximate Recall Crate - A high-performance HNSW implementation in Rust |
| homepage | |
| repository | https://github.com/carlosbertoncelli/swarc |
| max_upload_size | |
| id | 1861758 |
| size | 149,091 |
A high-performance implementation of the Hierarchical Navigable Small World (HNSW) algorithm in Rust. SWARC provides state-of-the-art approximate nearest neighbor search with excellent performance for high-dimensional vector similarity search.
HNSW constructs a multi-layer graph where:
Add this to your Cargo.toml:
[dependencies]
swarc = { path = "path/to/swarc" }
Or if using from crates.io (when published):
[dependencies]
swarc = "0.1.0"
use swarc::{HNSWIndex, Document};
use rand::Rng;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a new HNSW index
// Parameters: dimension, max_connections, ef_construction
let mut index = HNSWIndex::new(128, 16, 200);
// Create a document with external data
let document = Document {
id: "doc1".to_string(),
data: "This is my document content".to_string(),
};
// Generate a random embedding (in practice, use your embedding model)
let mut rng = rand::thread_rng();
let embedding: Vec<f32> = (0..128).map(|_| rng.gen_range(-1.0..1.0)).collect();
// Insert the document into the index
index.insert("node1".to_string(), embedding, Some(document))?;
// Search for nearest neighbors
let query: Vec<f32> = (0..128).map(|_| rng.gen_range(-1.0..1.0)).collect();
let results = index.search(&query, 5); // Find 5 nearest neighbors
for (id, distance, document) in results {
println!("ID: {}, Distance: {:.4}, Document: {:?}", id, distance, document);
}
// Remove a document
let removed_doc = index.remove("node1")?;
println!("Removed: {:?}", removed_doc);
Ok(())
}
Document<T>pub struct Document<T> {
pub id: String,
pub data: T,
}
A wrapper for external data associated with embeddings.
HNSWIndex<T>The main index structure that provides all HNSW operations.
new(dim: usize, m: usize, ef_construction: usize) -> HNSWIndex<T>Creates a new HNSW index.
dim: Dimensionality of the embedding vectorsm: Maximum number of connections per node (except layer 0)ef_construction: Size of dynamic candidate list during constructioninsert(id: String, embedding: Vec<f32>, document: Option<Document<T>>) -> Result<(), String>Inserts a new node into the index.
id: Unique identifier for the nodeembedding: The vector embeddingdocument: Optional associated document datasearch(query: &[f32], k: usize) -> Vec<(String, f32, Option<&Document<T>>)>Searches for k nearest neighbors.
query: The query vectork: Number of nearest neighbors to returnremove(id: &str) -> Result<Option<Document<T>>, String>Removes a node from the index.
id: The node identifier to removerebalance() -> Result<(), String>Rebalances the index structure (currently a placeholder for future enhancements).
len() -> usize: Get the number of nodes in the indexis_empty() -> bool: Check if the index is emptyget_node(id: &str) -> Option<&HNSWNode<T>>: Get a node by IDget_all_ids() -> Vec<String>: Get all node IDscontains(id: &str) -> bool: Check if a node existsclear(): Remove all nodes from the indexremove_multiple(ids: &[&str]) -> Result<Vec<Option<Document<T>>>, String>: Remove multiple nodesThe implementation is organized into several modules:
types.rs: Core data structures (Document, HNSWNode)index.rs: Main index structure and basic operationsinsert.rs: Insertion and rebalancing logicsearch.rs: Search and nearest neighbor algorithmsremove.rs: Node removal and cleanup operationsSWARC has been benchmarked with high-dimensional embeddings (3072 dimensions) across various dataset sizes. See the performance tests directory for detailed benchmarks and visualizations.
Key Performance Metrics:
Insertion time scales logarithmically with dataset size
Search time remains relatively constant across dataset sizes
Memory usage scales linearly with dataset size
Insertion throughput across different dataset sizes
m (Maximum Connections)ef_construction (Construction Parameter)max_layerslet mut index = HNSWIndex::new(128, 16, 200);
index.insert("node1".to_string(), vec![0.1, 0.2, 0.3], None)?;
let results = index.search(&vec![0.1, 0.2, 0.3], 1);
# Run benchmarks with millions of 3072-dimensional embeddings
cargo run --bin benchmark
# Generate performance plots
cargo run --bin plot_results
let doc = Document {
id: "article_1".to_string(),
data: Article { title: "AI Research", content: "..." },
};
index.insert("node1".to_string(), embedding, Some(doc))?;
// Insert multiple documents
for (i, embedding) in embeddings.iter().enumerate() {
let doc = Document {
id: format!("doc_{}", i),
data: documents[i].clone(),
};
index.insert(format!("node_{}", i), embedding.clone(), Some(doc))?;
}
// Remove multiple documents
let ids = ["node_1", "node_2", "node_3"];
let removed = index.remove_multiple(&ids)?;
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.