# edlib_rs This crate provides a Rust interface to the Edlib C++ library by Martin Šošić. See [Martinsos-edlib](https://github.com/Martinsos/edlib) The reference paper is : Martin Šošić, Mile Šikić; Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics 2017 [btw753. doi] The crate offers 2 interfaces to edlib. The first, accessed via module bindings, is direcly the interface generated by the bindgen crate. The second, accessed via module edlibrs, provides a more idiomatic Rust interface. It comes at the cost of cloning information stored in pointers startLocations and endLocations in C **struct EdlibAlignResult** to get a Rust **struct EdlibAlignResultRs** with **Option>** fields instead of pointers. The cigar string representation is also cloned when computed. As a consequence memory management is fully transferred to Rust. Structures and functions have the same name as in edlib with just "Rs" appended to original names. ## Example For the edlibrs interface we have for example: in normal mode: ```rust use edlib_rs::edlibrs::*; ... let query = "ACCTCTG"; let target = "ACTCTGAAA"; let align_res = edlibAlignRs(query.as_bytes(), target.as_bytes(), &EdlibAlignConfigRs::default()); assert_eq!(align_res.status, EDLIB_STATUS_OK); assert_eq!(align_res.editDistance, 4); ``` in the infix mode : ```rust use edlib_rs::edlibrs::*; ... let query = "ACCTCTG"; let target = "TTTTTTTTTTTTTTTTTTTTTACTCTGAAA"; // let mut config = EdlibAlignConfigRs::default(); config.mode = EdlibAlignModeRs::EDLIB_MODE_HW; let align_res = edlibAlignRs(query.as_bytes(), target.as_bytes(), &config); assert_eq!(align_res.editDistance, 1); ``` ## Installation The package has the original Edlib library sources embedded in the source tree (See directory **edlib-c**, corresponding to sources at the date of Decembre 2020) minus the original test_data directory to limit the size of the crate. The standard "cargo build" command runs the edlib's cmake. The crate enables a logger to monitor the call to the C-interface which is by default set in Cargo.toml to *info* for release mode and *trace* for debug mode, but can changed by setting the variable RUST_LOG (see env_logger doc). ## Tests Some tests in module edlib.rs can serve as basic examples. In directory examples there is also a small version of the edlib edaligner module (see apps/aligner in edlib installation dir) which runs on Fasta files containing only one sequence as contained in the original **edlib** directory *test_data*. As the embedded sources do not contain the original test_data sub-directory, it is necessary to download them separately to run the edaligner example module. Contrary to the edlib version the module given a query and a target sequence runs the 3 modes (normal/NW, prefix/SHW and infix/HW) in one pass. With *RUST_LOG=info ./target/release/examples/edaligner --dirdata "$edlibpath/test_data/Enterobacteria_Phage_1" --tf "Enterobacteria_phage_1.fasta" --qf "mutated_90_perc.fasta"* we get the following timing in release mode for Enterobacteria_phage_1.fasta as target sequence and mutated_90_perc.fasta as query sequence. | mode | edlibrs time(s) | edlib time(s) | distance | | :---: | :---: | :------: | :----: | | NW | 0.106 | 0.106 | 9506 | | SHW | 0.184 | 0.191 | 9502 | | HW | 0.682 | 0.695 | 9502 | We get the following timing in release mode for Enterobacteria_phage_1.fasta as target sequence and mutated_60_perc.fasta as query sequence. | mode | edlibrs time(s) | edlib time(s) | distance | | :---: | :---: | :------: | :----: | | NW | 0.398 | 0.398 | 39829 | | SHW | 0.670 | 0.684 | 39828 | | HW | 1.182 | 1.206 | 39828 | Except for infinitesimal variations of cpu time measurement we see we have the same computation times. ## License Licensed under either of * Apache License, Version 2.0, [LICENSE-APACHE](LICENSE-APACHE) or * MIT license [LICENSE-MIT](LICENSE-MIT) or at your option. This software was written on my own while working at [CEA](http://www.cea.fr/), [CEA-LIST](http://www-list.cea.fr/en/)