awry

Crates.ioawry
lib.rsawry
version
sourcesrc
created_at2024-12-10 22:40:37.532912
updated_at2024-12-12 22:33:51.318278
descriptionLibrary for creating FM-indexes from FASTA/FASTQ files. AWRY is able to search at lightning speed by leveraging SIMD vectorization and multithreading over collections of queries.
homepagehttps://github.com/UM-Applied-Algorithms-Lab/AWRY_Index
repositoryhttps://github.com/UM-Applied-Algorithms-Lab/AWRY_Index
max_upload_size
id1479127
Cargo.toml error:TOML parse error at line 18, column 1 | 18 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include`
size0
Tim Anderson (Sawwave)

documentation

README

AWRY

Avx Windowed fm-index in Rust? Yes!

Generates an Fm-Index of a given biological sequence text (Fasta or Fastq file), and implements Locate() and Search() functionalities.

AWRY is a port of a state-of-the-art, fastest in its class FM-index implementation (https://doi.org/10.1186/s13015-021-00204-6). AWRY supports parallelized searching, with parallel_count() and parallel_locate() functions.

Building an FM-index

to build an fm-index, create an FmBuildArgs struct, and call FmIndex::new()

let buildArgs =  FmBuildArgs {
    input_file_src: "my_input.fa",              //sets what the input file for the database text will be
    suffix_array_output_src: None,              //will build to a default location
    suffix_array_compression_ratio: None,       // ratio of suffix array compression, 8 by default
    lookup_table_kmer_len: None,                //by default, chooses reasonable table sizes (Dna=13, Amino=5)
    alphabet: SymbolAlphabet::Nucleotide,       //alphabet to build
    max_query_len: None,                        //if set, only sort suffix array up to n positions
    remove_intermediate_suffix_array_file: true,//deletes the suffix array file if true
}

let fm_index = FmIndex::new(&buildArgs);

If you only intend to use the count function, you can set the suffix array compression to a high value like 255 to reduce memory usage.

Searching for a query

To search for a query, use to count_string and locate_string functions.

pub fn count_string(&self, query: &String) -> u64 {
    ...
}

/// Finds the locations in the original text of all isntances of the given query.
pub fn locate_string(&self, query: &String) -> Vec<u64> {
    ...
}

Searching for queries in parallel

To find a large number of queries, searching can be parallelized easily with the parallel_count and parallel_locate functions

pub fn parallel_count(&self, queries: &Vec<String>) -> Vec<u64> {
    ...
}

// Finds the locations for each query in the query list. This function uses rayon's into_par_iter() for parallelism.
pub fn parallel_locate(&self, queries: &Vec<String>) -> Vec<Vec<u64>> {
    ...
}
Commit count: 243

cargo fmt