sbwt

Crates.iosbwt
lib.rssbwt
version0.4.0
created_at2024-03-12 23:20:49.541295+00
updated_at2025-05-24 18:53:38.614831+00
descriptionIndexing sets of DNA k-mers with the spectral Burrow-Wheeler transform.
homepage
repositoryhttps://github.com/jnalanko/sbwt-rs-cli/tree/master/api
max_upload_size
id1171144
size865,644
Jarno N. Alanko (jnalanko)

documentation

README

SBWT: A Burrows-Wheeler transform for DNA k-mers

This crate provides an implementation of the Bit Matrix SBWT data structure, as described in Small Searchable k-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform, for the DNA alphabet ACGT. The data structure uses a variant of the Burrows-Wheeler transform to compress a set of k-mers in way that allows fast lookup queries. If the input k-mers are consecutive k-mers from longer underlying sequences, the index takes typically around 5 bits per distinct k-mer, supporting k-mer lookup queries at a speed of around 1 μs / k-mer on modern hardware.

Queries can be further sped up by using the Longest common suffix array, (see here) taking roughly log(k) bits of space per k-mer. The LCS array also enables the computation of k-bounded matching statistics and shortest frequency-bounded suffixes (both described here). Finally, the crate provides an interface for traversing the node-centric de Bruijn graph of the k-mers.

API documentation available at https://docs.rs/sbwt/latest/sbwt/.

A CLI for key features of the API can be found at https://github.com/jnalanko/sbwt-rs-cli. The API is developed together with the CLI at https://github.com/jnalanko/sbwt-rs-cli/tree/master/api.

Changes between releases are documented in the changelog.

Commit count: 102

cargo fmt