| Crates.io | embeddenator-retrieval |
| lib.rs | embeddenator-retrieval |
| version | 0.20.0 |
| created_at | 2026-01-09 22:19:12.35991+00 |
| updated_at | 2026-01-25 18:28:45.765158+00 |
| description | Semantic retrieval and search operations for VSA-based vector representations |
| homepage | |
| repository | https://github.com/tzervas/embeddenator-retrieval |
| max_upload_size | |
| id | 2032994 |
| size | 204,786 |
Semantic retrieval and search operations for VSA-based vector representations.
Independent component extracted from the Embeddenator monolithic repository. Part of the Embeddenator workspace.
Repository: https://github.com/tzervas/embeddenator-retrieval
Phase 2B Component Implementation - Full retrieval functionality migrated from monolithic repo.
use embeddenator_retrieval::{TernaryInvertedIndex, search::two_stage_search, search::SearchConfig};
use embeddenator_vsa::SparseVec;
use std::collections::HashMap;
// Build index
let mut index = TernaryInvertedIndex::new();
let mut vectors = HashMap::new();
let vec1 = SparseVec::from_data(b"document one");
let vec2 = SparseVec::from_data(b"document two");
let vec3 = SparseVec::from_data(b"document three");
index.add(1, &vec1);
index.add(2, &vec2);
index.add(3, &vec3);
index.finalize();
vectors.insert(1, vec1);
vectors.insert(2, vec2);
vectors.insert(3, vec3);
// Search with two-stage retrieval (fast + accurate)
let query = SparseVec::from_data(b"document");
let config = SearchConfig::default();
let results = two_stage_search(&query, &index, &vectors, &config, 5);
for result in results {
println!("ID: {}, Score: {:.3}, Rank: {}",
result.id, result.score, result.rank);
}
use embeddenator_retrieval::similarity::{compute_similarity, SimilarityMetric};
use embeddenator_vsa::SparseVec;
let a = SparseVec::from_data(b"hello");
let b = SparseVec::from_data(b"hello world");
let cosine = compute_similarity(&a, &b, SimilarityMetric::Cosine);
let hamming = compute_similarity(&a, &b, SimilarityMetric::Hamming);
let jaccard = compute_similarity(&a, &b, SimilarityMetric::Jaccard);
println!("Cosine: {:.3}, Hamming: {:.1}, Jaccard: {:.3}",
cosine, hamming, jaccard);
Estimated benchmarks on a modern multi-core CPU (corpus size = 10,000 vectors):
| Strategy | Latency (avg) | Throughput | Recall@10 |
|---|---|---|---|
| Approximate | ~0.5ms | ~2000 QPS | ~0.85 |
| Two-stage (candidate_k=200) | ~2ms | ~500 QPS | ~0.98 |
| Exact | ~15ms | ~66 QPS | 1.00 |
Note: Actual performance varies significantly based on hardware, vector dimensionality, data distribution, and query patterns. Run benchmarks on your system for accurate numbers:
Run benchmarks:
cargo bench --manifest-path embeddenator-retrieval/Cargo.toml
# Run all tests
cargo test --manifest-path embeddenator-retrieval/Cargo.toml --all-features
# Run specific test suite
cargo test --manifest-path embeddenator-retrieval/Cargo.toml similarity_tests
cargo test --manifest-path embeddenator-retrieval/Cargo.toml search_tests
# Run with output
cargo test --manifest-path embeddenator-retrieval/Cargo.toml -- --nocapture
# Build
cargo build --manifest-path embeddenator-retrieval/Cargo.toml
# Local development with other Embeddenator components
# Add to workspace Cargo.toml:
[patch."https://github.com/tzervas/embeddenator-retrieval"]
embeddenator-retrieval = { path = "../embeddenator-retrieval" }
SparseVec for all vector operationsSee ADR-016 for component decomposition rationale.
MIT