| Crates.io | searus |
| lib.rs | searus |
| version | 0.0.3 |
| created_at | 2025-12-09 20:13:26.869964+00 |
| updated_at | 2025-12-11 10:52:02.12047+00 |
| description | A simple extension-based search engine library |
| homepage | |
| repository | https://github.com/kevinj045/searus |
| max_upload_size | |
| id | 1976273 |
| size | 159,616 |
A flexible, multi-modal search engine library for Rust.
Searus is a powerful search engine library that provides multiple search strategies out of the box:
IndexAdapter traitserde supportAdd this to your Cargo.toml:
[dependencies]
searus = "0.0.3"
use searus::prelude::*;
use searus::searchers::SemanticSearch;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
struct Post {
title: String,
content: String,
tags: Vec<String>,
}
fn main() {
// Configure semantic search rules
let rules = SemanticRules::builder()
.field("title", FieldRule::bm25().priority(3).boost(2.0))
.field("content", FieldRule::bm25().priority(2).boost(1.0))
.build();
// Create a searcher
let searcher = SemanticSearch::new(rules);
// Build the search engine
let engine: SearusEngine<Post> = SearusEngine::builder()
.with(Box::new(searcher))
.build();
// Your documents
let posts = vec![
Post {
title: "Getting Started with Rust".to_string(),
content: "Rust is a systems programming language...".to_string(),
tags: vec!["rust".to_string(), "tutorial".to_string()],
},
// ... more posts
];
// Search!
let query = Query::builder()
.text("rust programming")
.options(SearchOptions::default().limit(10))
.build();
let results = engine.search(&posts, &query);
for result in results {
println!("{} (score: {:.3})", result.item.title, result.score);
}
}
BM25-based text search with configurable field rules and matching strategies:
use searus::prelude::*;
use searus::searchers::SemanticSearch;
let rules = SemanticRules::builder()
.field("title", FieldRule::bm25().priority(3).boost(2.0))
.field("content", FieldRule::tokenized().priority(1))
.field("author", FieldRule::exact())
.build();
let searcher = SemanticSearch::new(rules);
Matching Strategies:
Matcher::BM25 - Full BM25 scoring with IDFMatcher::Tokenized - Simple term frequency matchingMatcher::Exact - Case-insensitive exact string matchingMatcher::Fuzzy - Delegated to FuzzySearchMatch documents by tags with configurable field names:
use searus::searchers::TaggedSearch;
// Default field name is "tags"
let tag_searcher = TaggedSearch::new();
// Or specify a custom field
let tag_searcher = TaggedSearch::with_field("categories");
let query = Query::builder()
.tags(vec!["rust".to_string(), "tutorial".to_string()])
.build();
Enhance tag-based search by defining relationships between tags. This allows queries for a parent tag (e.g., "programming") to automatically include results for child tags (e.g., "rust", "python").
use searus::searchers::tagged::{TagNode, TagRelationshipTree};
use std::collections::HashMap;
// Define tag relationships
let nodes = vec![
TagNode {
tag: "rust".to_string(),
relationships: HashMap::from([("programming".to_string(), 0.8)]),
},
TagNode {
tag: "python".to_string(),
relationships: HashMap::from([("programming".to_string(), 0.7)]),
},
];
let trt = TagRelationshipTree::new(nodes);
// Configure searcher with TRT
let tag_searcher = TaggedSearch::new().with_trt(trt);
// Query with TRT expansion (depth 1)
let query = Query::builder()
.tags(vec!["programming".to_string()])
.with_trt(1)
.build();
String similarity matching using Jaro-Winkler distance:
use searus::searchers::FuzzySearch;
let fuzzy_searcher = FuzzySearch::new(vec!["title".to_string(), "content".to_string()])
.with_threshold(0.8); // Minimum similarity: 0.0 to 1.0
let query = Query::builder()
.text("programing") // Will match "programming"
.build();
Combine multiple searchers with custom weights:
use searus::prelude::*;
use searus::searchers::{SemanticSearch, TaggedSearch, FuzzySearch};
let semantic_rules = SemanticRules::builder()
.field("title", FieldRule::bm25().priority(2))
.field("content", FieldRule::tokenized())
.build();
let engine = SearusEngine::builder()
.with(Box::new(SemanticSearch::new(semantic_rules)))
.with(Box::new(TaggedSearch::new()))
.with(Box::new(FuzzySearch::new(vec!["title".to_string()])))
.build();
let query = Query::builder()
.text("rust")
.tags(vec!["tutorial".to_string()])
.options(
SearchOptions::default()
.weight(SearcherKind::Semantic, 0.6)
.weight(SearcherKind::Tags, 0.4)
)
.build();
Customize the search lifecycle with the SearusExtension trait. Extensions can intercept queries, modify items, and alter results.
use searus::prelude::*;
struct LoggingExtension;
impl<T: Searchable> SearusExtension<T> for LoggingExtension {
fn before_query(&self, query: &mut Query) {
println!("Executing query: {:?}", query);
}
fn after_searcher(&self, _query: &Query, results: &mut Vec<SearusMatch<T>>) {
println!("Searcher returned {} results", results.len());
}
}
// Register extension in the engine
let engine: SearusEngine<Post> = SearusEngine::builder()
.with(Box::new(searcher))
.with_extension(Box::new(LoggingExtension))
.build();
Implement your own search strategies by implementing the Searcher trait.
use searus::prelude::*;
struct MySearcher;
impl<T: Searchable> Searcher<T> for MySearcher {
fn kind(&self) -> SearcherKind {
SearcherKind::Custom
}
fn search(&self, context: &SearchContext<T>, query: &Query) -> Vec<SearusMatch<T>> {
// Implement your search logic here
vec![]
}
}
This allows you to plug in any algorithm (e.g., TF-IDF, LSH, experimental models) and combine it with built-in searchers.
For large datasets (100k+ entities), consider these optimization strategies:
parallel feature to use rayon for concurrent search execution.IndexAdapter that supports ANN (e.g., HNSW) instead of brute-force KNN.Searus supports pluggable storage backends through the IndexAdapter trait:
use searus::index::{IndexAdapter, InMemIndex};
// Built-in in-memory index
let mut index: InMemIndex<Post> = InMemIndex::new();
index.put(
"post-1".to_string(),
post,
Some(embedding_vector), // Optional vector for KNN search
Some(vec!["rust".to_string()]), // Optional tags
).unwrap();
// Find nearest neighbors
let neighbors = index.knn(&query_vector, 10);
Implement IndexAdapter for your own storage backend (e.g., PostgreSQL, Redis, Qdrant).
Searus provides traits for embedding providers:
use searus::embeddings::{TextEmbedder, StubTextEmbedder};
// Built-in stub embedder for testing
let embedder = StubTextEmbedder::new(384); // 384-dimensional vectors
let embedding = embedder.embed("Hello, world!")?;
// Implement TextEmbedder for your own provider (OpenAI, Cohere, local models, etc.)
Fine-tune your search with query options:
let query = Query::builder()
.text("rust programming")
.tags(vec!["tutorial".to_string()])
.options(
SearchOptions::default()
.limit(20) // Max results
.skip(10) // Pagination offset
.timeout_ms(5000) // Search timeout
.weight(SearcherKind::Semantic, 0.7) // Searcher weights
.weight(SearcherKind::Tags, 0.3)
)
.filters(
// views >= 1000 OR author = Bob
Query::filter(Query::OR)
.with(
Query::filter(Query::COMPARE)
.ge("views", 1000)
.build()
)
.with(
Query::filter(Query::COMPARE)
.eq("author", "Bob")
.build()
)
.build()
)
.build();
Searus provides detailed scoring information:
for result in results {
println!("Score: {:.3}", result.score);
// Per-field scores
for (field, score) in &result.field_scores {
println!(" {}: {:.3}", field, score);
}
// Match details
for detail in &result.details {
match detail {
SearchDetail::Semantic { matched_terms, .. } => {
println!(" Matched: {}", matched_terms.join(", "));
}
SearchDetail::Tag { matched_tags, .. } => {
println!(" Tags: {}", matched_tags.join(", "));
}
SearchDetail::Fuzzy { original_term, matched_term, similarity } => {
println!(" {} → {} ({:.2})", original_term, matched_term, similarity);
}
_ => {}
}
}
}
Run the included examples:
# Basic semantic search
cargo run --example basic_semantic
# Multi-strategy search
cargo run --example multi_searcher
# Time check
cargo run --example time_check --features parallel
# Filters example
cargo run --example verify_filters
# Tagged TRT search
cargo run --example tagged_trt
sled or rocksdb).Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.