Crates.io | awry |
lib.rs | awry |
version | |
source | src |
created_at | 2024-12-10 22:40:37.532912 |
updated_at | 2024-12-12 22:33:51.318278 |
description | Library for creating FM-indexes from FASTA/FASTQ files. AWRY is able to search at lightning speed by leveraging SIMD vectorization and multithreading over collections of queries. |
homepage | https://github.com/UM-Applied-Algorithms-Lab/AWRY_Index |
repository | https://github.com/UM-Applied-Algorithms-Lab/AWRY_Index |
max_upload_size | |
id | 1479127 |
Cargo.toml error: | TOML parse error at line 18, column 1 | 18 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
Avx Windowed fm-index in Rust? Yes!
Generates an Fm-Index of a given biological sequence text (Fasta or Fastq file), and implements Locate() and Search() functionalities.
AWRY is a port of a state-of-the-art, fastest in its class FM-index implementation (https://doi.org/10.1186/s13015-021-00204-6). AWRY supports parallelized searching, with parallel_count() and parallel_locate() functions.
to build an fm-index, create an FmBuildArgs struct, and call FmIndex::new()
let buildArgs = FmBuildArgs {
input_file_src: "my_input.fa", //sets what the input file for the database text will be
suffix_array_output_src: None, //will build to a default location
suffix_array_compression_ratio: None, // ratio of suffix array compression, 8 by default
lookup_table_kmer_len: None, //by default, chooses reasonable table sizes (Dna=13, Amino=5)
alphabet: SymbolAlphabet::Nucleotide, //alphabet to build
max_query_len: None, //if set, only sort suffix array up to n positions
remove_intermediate_suffix_array_file: true,//deletes the suffix array file if true
}
let fm_index = FmIndex::new(&buildArgs);
If you only intend to use the count function, you can set the suffix array compression to a high value like 255 to reduce memory usage.
To search for a query, use to count_string and locate_string functions.
pub fn count_string(&self, query: &String) -> u64 {
...
}
/// Finds the locations in the original text of all isntances of the given query.
pub fn locate_string(&self, query: &String) -> Vec<u64> {
...
}
To find a large number of queries, searching can be parallelized easily with the parallel_count and parallel_locate functions
pub fn parallel_count(&self, queries: &Vec<String>) -> Vec<u64> {
...
}
// Finds the locations for each query in the query list. This function uses rayon's into_par_iter() for parallelism.
pub fn parallel_locate(&self, queries: &Vec<String>) -> Vec<Vec<u64>> {
...
}