genedex

Crates.iogenedex
lib.rsgenedex
version0.2.2
created_at2025-09-15 19:18:37.275701+00
updated_at2025-09-25 17:00:09.368844+00
descriptionA small and fast FM-Index implementation
homepagehttps://github.com/feldroop/genedex
repositoryhttps://github.com/feldroop/genedex
max_upload_size
id1840497
size205,370
Felix Leander Droop (feldroop)

documentation

https://docs.rs/genedex

README

⚡genedex: A Small and Fast FM-Index for Rust⚡

Build Status Crates.io Documentation

The FM-Index is a full-text index data structure that allows efficiently counting and retrieving the positions of all occurrenes of short sequences in very large texts. It is widely used in sequence analysis and bioinformatics.

The implementation of this library is based on an encoding for the text with rank support data structure (a.k.a. occurrence table) by Simon Gene Gottlieb, who also was a great help while developing the library. This data structure is central to the inner workings of the FM-Index. The encoding attemps to provide a good trade-off between memory usage and running time of queries. A second, faster and less memory efficient encoding is also implemented in this library. Further benefits of genedex include:

  • Fast, parallel and memory efficient index construction by leveraging libsais-rs and rayon.
  • Support for indexing a set of texts, like chromosomes of a genome.
  • Optimized functions for searching multiple queries at once (per thread, this is not multithreading).
  • A flexible cursor API.
  • Fast reading and writing the FM-Index from/to files, using savefile.
  • Thoroughly tested using proptest.

⚠️ Warning: this library is in an early stage. The API is still subject to changes. Currently, only a basic FM-Index is implemented. For upcoming features, take a look at the roadmap. Any kind of feedback and suggestions via the issue tracker is highly appreciated! ⚠️

Usage

For detailed information about how to use genedex, please refer to the documentation. The following is an example of the most basic functionality:

use genedex::{FmIndexConfig, alphabet};

let dna_n_alphabet = alphabet::ascii_dna_with_n();
let texts = [b"aACGT", b"acGtn"];

let index = FmIndexConfig::<i32>::new().construct_index(texts, dna_n_alphabet);

let query = b"GT";
assert_eq!(index.count(query), 2);

for hit in index.locate(query) {
    println!(
        "Found query in text {} at position {}.",
        hit.text_id, hit.position
    );
}

Comparison to Other Crates and Benchmarks

Work in progress. Can be found here

Commit count: 106

cargo fmt