bigsi_rs

Crates.iobigsi_rs
lib.rsbigsi_rs
version0.1.1
sourcesrc
created_at2020-02-15 04:19:38.655865
updated_at2023-03-30 15:14:40.814476
descriptionA in-memory implementation of a BIGSI-like data structure
homepage
repositoryhttps://github.com/hcdenbakker/bigsi_rs
max_upload_size
id209320
size13,242
Henk den Bakker (hcdenbakker)

documentation

README

bigsi_rs

Rust in-memory implementation of a BIGSI-like data structure

Rust in-memory implementation of a BIGSI-like data structure (see https://www.nature.com/articles/s41587-018-0010-1). Comparable to a bloom filter; where a bloom filter tells if an element belongs to a single previously indexed set, a BIGSI-like data structure efficiently tells if an elements is a member of multiple query sets. Parameters (in particular the index size and the number of hashes) should be chosen to assure a (down-stream) application permissable false positive probabilty. My strategy is to base this on the largest set to be indexed and use the formula of Goel and Gupta 2010 (https://web.stanford.edu/~ashishg/papers/inverted.pdf) to calculate the false postive probability for this set, and hence the maximum false positive probability for the index.

Example

use bigsi_rs;


//create a new index of size 250,000, 10 accessions and 3 hash functions
let mut new_filter = bigsi_rs::Bigsi::new(250000, 10, 3);

//insert words in index for accessions 0, 3 and 7        
new_filter.insert(0, "ATGT");
new_filter.insert(3, "ATGT");
new_filter.insert(7, "ATGT");

//shrink uninformative elements in index
new_filter.slim();

assert_eq!(new_filter.get("ATGT").len(), 3 as usize);
assert_eq!(new_filter.get("ATGC").len(), 0 as usize);
Commit count: 14

cargo fmt