| Crates.io | sochdb-index |
| lib.rs | sochdb-index |
| version | 0.4.3 |
| created_at | 2026-01-12 17:09:24.443947+00 |
| updated_at | 2026-01-23 20:33:47.78202+00 |
| description | SochDB indexing (HNSW vector index and related utilities) |
| homepage | https://sochdb.dev |
| repository | https://github.com/sochdb/sochdb |
| max_upload_size | |
| id | 2038223 |
| size | 1,708,507 |
High-performance vector indexing and embedding integration for agent observability.
Complete embedding pipeline for semantic search:
use sochdb_index::embedding::{LocalEmbeddingProvider, EmbeddingProvider};
// Create local provider (offline, no API cost)
let provider = LocalEmbeddingProvider::default_provider()?;
// Embed text
let vector = provider.embed("Find traces with errors")?;
println!("Embedded to {} dimensions", vector.len());
// Batch embedding
let texts = vec!["query 1", "query 2", "query 3"];
let vectors = provider.embed_batch(&texts)?;
use sochdb_index::embedding::{
EmbeddingIntegration, IntegrationConfig, LocalEmbeddingProvider,
};
use std::sync::Arc;
// Create provider and integration
let provider = Arc::new(LocalEmbeddingProvider::default_provider()?);
let mut integration = EmbeddingIntegration::new(provider, IntegrationConfig::default())?;
// Start background worker
integration.start_background_worker();
// Submit traces for embedding (non-blocking)
integration.submit_for_embedding(edge_id, "Error in authentication module")?;
// Semantic search
let results = integration.semantic_search("find auth errors", 10)?;
for result in results {
println!("Edge {}: similarity={:.3}", result.edge_id, result.similarity);
}
use sochdb_index::{VamanaConfig, VamanaIndex, PQCodebooks};
// Train PQ codebooks on sample vectors
let codebooks = PQCodebooks::train(&sample_vectors, 20, 8);
// Create Vamana index with PQ
let config = VamanaConfig::default();
let mut index = VamanaIndex::new(384, config);
// Insert vectors (automatically PQ-encoded)
for (id, vector) in vectors {
index.insert(id, vector);
}
// Search
let results = index.search(&query_vector, 10);
With Product Quantization (PQ):
| Vectors | Full F32 | PQ Compressed | Savings |
|---|---|---|---|
| 1M | 1.5 GB | 48 MB | 97% |
| 10M | 15 GB | 480 MB | 97% |
| 100M | 150 GB | 4.8 GB | 97% |
Trace → LSM Write (sync) → CSR Update (sync) → Embedding Queue (async)
↓
Background Worker
↓
Embed → Normalize → PQ Encode
↓
HNSW/Vamana Insert
hnsw - HNSW index with concurrent insert/searchvamana - DiskANN-style single-layer graphproduct_quantization - 32x vector compressionembedding/provider - Embedding abstraction traitembedding/pipeline - Batched processing pipelineembedding/storage - Persistent embedding storageembedding/normalize - SIMD L2 normalizationembedding/index_integration - Connect pipeline to index# Run all tests
cargo test -p sochdb-index
# Run embedding tests only
cargo test -p sochdb-index embedding::
# Run integration tests
cargo test -p sochdb-index --test vamana_integration_test
| Provider | Single Text | Batch 32 |
|---|---|---|
| Local | ~5 ms | ~50 ms |
| OpenAI | ~100 ms | ~150 ms |
| Index | Top-10 | Top-100 |
|---|---|---|
| HNSW | <5 ms | <10 ms |
| Vamana | <3 ms | <8 ms |
Same as the parent SochDB project.