Crates.io | probly-search |
lib.rs | probly-search |
version | 2.0.1 |
source | src |
created_at | 2021-06-06 06:29:14.337446 |
updated_at | 2024-07-03 07:15:47.05423 |
description | A lightweight full-text search engine with a fully customizable scoring function |
homepage | https://github.com/quantleaf/probly-search |
repository | https://github.com/quantleaf/probly-search |
max_upload_size | |
id | 406754 |
size | 77,806 |
A full-text search library, written in Rust, optimized for insertion speed, that provides full control over the scoring calculations.
This start initially as a port of the Node library NDX.
Recipe (title) search with 50k documents.
https://quantleaf.github.io/probly-search-demo/
Three ways to do scoring
ScoreCalculator
trait.Trie based dynamic Inverted Index.
Multiple fields full-text indexing and searching.
Per-field score boosting.
Configurable tokenizer.
Free text queries with query expansion.
Fast allocation, but latent deletion.
WASM compatible
See Integration tests.
See recipe search demo project
Creating an index with a document that has 2 fields. Query documents, and remove a document.
use std::collections::HashSet;
use probly_search::{
index::Index,
query::{
score::default::{bm25, zero_to_one},
QueryResult,
},
};
// A white space tokenizer
fn tokenizer(s: &str) -> Vec<Cow<str>> {
s.split(' ').map(Cow::from).collect::<Vec<_>>()
}
// We have to provide extraction functions for the fields we want to index
// Title
fn title_extract(d: &Doc) -> Vec<&str> {
vec![d.title.as_str()]
}
// Description
fn description_extract(d: &Doc) -> Vec<&str> {
vec![d.description.as_str()]
}
// Create index with 2 fields
let mut index = Index::<usize>::new(2);
// Create docs from a custom Doc struct
let doc_1 = Doc {
id: 0,
title: "abc".to_string(),
description: "dfg".to_string(),
};
let doc_2 = Doc {
id: 1,
title: "dfgh".to_string(),
description: "abcd".to_string(),
};
// Add documents to index
index.add_document(
&[title_extract, description_extract],
tokenizer,
doc_1.id,
&doc_1,
);
index.add_document(
&[title_extract, description_extract],
tokenizer,
doc_2.id,
&doc_2,
);
// Search, expected 2 results
let mut result = index.query(
&"abc",
&mut bm25::new(),
tokenizer,
&[1., 1.],
);
assert_eq!(result.len(), 2);
assert_eq!(
result[0],
QueryResult {
key: 0,
score: 0.6931471805599453
}
);
assert_eq!(
result[1],
QueryResult {
key: 1,
score: 0.28104699650060755
}
);
// Remove documents from index
index.remove_document(doc_1.id);
// Vacuum to remove completely
index.vacuum();
// Search, expect 1 result
result = index.query(
&"abc",
&mut bm25::new(),
tokenizer,
&[1., 1.],
);
assert_eq!(result.len(), 1);
assert_eq!(
result[0],
QueryResult {
key: 1,
score: 0.1166450426074421
}
);
Go through source tests in for the BM25 implementation and zero-to-one implementation for more query examples.
Run all tests with
cargo test
Run all benchmarks with
cargo bench