| Crates.io | simple_db_nn |
| lib.rs | simple_db_nn |
| version | 0.1.0 |
| created_at | 2025-07-28 17:59:33.111172+00 |
| updated_at | 2025-07-28 17:59:33.111172+00 |
| description | Very stupid and simple db with nearest neighbors algorithms to use with embeddings |
| homepage | |
| repository | https://github.com/carlosb1/simple-db-nn |
| max_upload_size | |
| id | 1771436 |
| size | 106,237 |
SimpleDBNN is a Rust library that combines a lightweight text database with a vector index for performing similarity searches using embeddings. It is designed to be flexible and embeddable with any custom embedding engine that implements the Embeddable trait.
heed and arroyEmbeddable traitAdd this to your Cargo.toml:
[dependencies]
simple_db_nn = "0.1"
Note: replace with the published version on crates.io when available.
struct DummyEmbedding;
impl Embeddable for DummyEmbedding {
fn to_embedding(&self, content: Vec<u8>) -> Vec<f32> {
let content_str = String::from_utf8(content).unwrap();
if content_str.starts_with("$") {
vec![100.0; 384]
} else {
vec![0.0; 384]
}
}
}
use arroy::distances::Euclidean;
use simple_db_nn::{SimpleDBNN, Embeddable};
use std::path::PathBuf;
let mut db = SimpleDBNN::new(
PathBuf::from("./db"),
PathBuf::from("./embedded_db"),
PathBuf::from("./config.json"),
DummyEmbedding,
384,
0,
42,
).unwrap();
db.put("Hello world").unwrap();
let results = db.get("Hello", 4).unwrap();
for (id, dist, text) in results {
println!("ID: {id}, Distance: {dist}, Text: {text}");
}
| Function | Description |
|---|---|
put(&str) |
Insert and index content |
get(&str, usize) |
Search top-n similar entries |
put_batch(Vec<&str>) |
Insert a batch of entries |
clear() |
Delete all persisted data |
get_current_id() |
Get the next internal ID to be assigned |
Embeddable TraitImplement this trait to integrate your own embedding engine:
pub trait Embeddable {
fn to_embedding(&self, content: Vec<u8>) -> Vec<f32>;
}
Run tests with:
cargo test
Covers:
DummyEmbedding and FastEmbeddingThis project is dual-licensed under MIT or Apache-2.0 — choose whichever you prefer.
arroy for approximate nearest neighbor indexingheed for efficient LMDB-based storagefastembed for real embedding backendsHave a custom embedding engine? Just implement Embeddable and you're ready.
Pull requests and suggestions are welcome ❤️