| Crates.io | lethe-core-rust |
| lib.rs | lethe-core-rust |
| version | 0.1.1 |
| created_at | 2025-09-14 18:39:06.137346+00 |
| updated_at | 2025-09-14 23:14:07.552842+00 |
| description | High-performance hybrid retrieval engine combining BM25 lexical search with vector similarity using z-score fusion. Features hero configuration for optimal parity with splade baseline, gamma boosting for code/error contexts, and comprehensive chunking pipeline. |
| homepage | https://github.com/nrice/lethe |
| repository | https://github.com/nrice/lethe |
| max_upload_size | |
| id | 1839035 |
| size | 7,296,159 |
A high-performance hybrid retrieval engine that combines BM25 lexical search with vector similarity using z-score fusion. Lethe Core provides state-of-the-art context selection for conversational AI and retrieval-augmented generation (RAG) systems.
Add this to your Cargo.toml:
[dependencies]
lethe-core-rust = "0.1.0"
use lethe_core_rust::{get_hero_config, apply_zscore_fusion, Candidate};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Get the hero configuration (optimal for splade parity)
let config = get_hero_config();
println!("Hero config: α={}, β={}, k_final={}",
config.alpha, config.beta, config.k_final);
// Example BM25 candidates
let bm25_candidates = vec![
Candidate {
doc_id: "doc1".to_string(),
score: 0.8,
text: Some("Rust async programming tutorial".to_string()),
kind: Some("bm25".to_string()),
},
Candidate {
doc_id: "doc2".to_string(),
score: 0.6,
text: Some("Python async examples".to_string()),
kind: Some("bm25".to_string()),
},
];
// Example vector candidates
let vector_candidates = vec![
Candidate {
doc_id: "doc1".to_string(),
score: 0.9,
text: Some("Rust async programming tutorial".to_string()),
kind: Some("vector".to_string()),
},
Candidate {
doc_id: "doc3".to_string(),
score: 0.7,
text: Some("Async programming concepts".to_string()),
kind: Some("vector".to_string()),
},
];
// Apply z-score fusion with hero configuration (α=0.5)
let results = apply_zscore_fusion(bm25_candidates, vector_candidates, 0.5);
println!("Hybrid results:");
for (i, result) in results.iter().enumerate() {
println!("{}. {} (score: {:.3})", i + 1, result.doc_id, result.score);
}
Ok(())
}
use lethe_core_rust::{
HybridRetrievalService, HybridRetrievalConfig,
ChunkingService, ChunkingConfig
};
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create chunking service
let chunking_config = ChunkingConfig::default();
let chunking_service = ChunkingService::new(chunking_config);
// Chunk some text
let text = "This is a sample document about Rust programming. \
Rust is a systems programming language. \
It provides memory safety without garbage collection.";
let chunks = chunking_service.chunk_text(
text,
"session-123",
uuid::Uuid::new_v4(),
0
).await?;
println!("Created {} chunks", chunks.len());
// Set up hybrid retrieval with hero configuration
let config = HybridRetrievalConfig::hero();
let service = HybridRetrievalService::mock_for_testing();
println!("Hero configuration loaded:");
println!(" α (BM25 weight): {}", config.alpha);
println!(" β (Vector weight): {}", config.beta);
println!(" k_initial (pool size): {}", config.k_initial);
println!(" k_final (results): {}", config.k_final);
println!(" Diversification: {}", config.diversify_method);
Ok(())
}
Lethe Core implements a sophisticated z-score fusion algorithm that normalizes and combines scores from different retrieval methods:
hybrid_score = α * z_bm25 + β * z_vectorThe hero configuration provides optimal parameters validated against splade baseline performance:
tokio::try_join![dependencies]
lethe-core-rust = { version = "0.1.0", features = ["ollama"] }
Lethe Core follows a modular architecture:
lethe-shared: Common types, errors, and utilitieslethe-domain: Core business logic and serviceslethe-infrastructure: External integrations and adaptersCandidate: Search result with document ID, score, and metadataHybridRetrievalConfig: Configuration for retrieval parametersHybridRetrievalService: Main service orchestrating hybrid searchChunk: Text segment with tokenization and metadataEmbeddingVector: Vector representation with similarity operationsThe repository includes several examples:
Run the test suite:
cargo test
Run with logging enabled:
RUST_LOG=debug cargo test -- --nocapture
The hero configuration has been validated against golden parity snapshots ensuring consistent performance with the TypeScript implementation. Key validation points:
Contributions are welcome! Please ensure:
cargo testcargo fmtcargo clippycargo doc --no-depsThis project is licensed under the MIT License - see the LICENSE file for details.
If you use Lethe Core in your research, please cite:
@software{lethe_core_rust,
title={Lethe Core: High-Performance Hybrid Retrieval with Z-Score Fusion},
author={Nathan Rice},
year={2024},
url={https://github.com/nrice/lethe}
}