Crates.io | prot_translate |
lib.rs | prot_translate |
version | 0.1.0 |
source | src |
created_at | 2024-04-15 17:08:43.581295 |
updated_at | 2024-04-15 17:08:43.581295 |
description | Translate nucleotide sequence to protein. |
homepage | https://github.com/DorianCoding/prot_translate |
repository | https://github.com/DorianCoding/prot_translate |
max_upload_size | |
id | 1209459 |
size | 45,592 |
Translate nucleotide sequence (dna or rna) to protein.
Add this to your Cargo.toml
:
[dependencies]
prot_translate = "0.1.0"
use prot_translate::*;
fn main() {
let dna = b"GTGAGTCGTTGAGTCTGATTGCGTATC";
let protein = translate(dna);
assert_eq!("VSR*V*LRI", &protein);
let dna = b"GCTAGTCGTATCGTAGCTAGTC";
let peptide = translate3(dna,None);
assert_eq!(&peptide, "AlaSerArgIleValAlaSer");
// To shift reading frame
let protein_frame2 = translate(&dna[1..]);
assert_eq!("*VVESDCV", &protein_frame2);
let dna = b"GCTAGTCGTATCGTAGCTAGTC";
let peptide = translate_full(dna,None);
assert_eq!(&peptide, "AlanineSerineArginineIsoleucineValineAlanineSerine");
}
The current algorithm is inspired by seqan's implementation which uses array indexing. Here is how it performs vs other methods (tested on 2012 macbook pro).
Method | 10 bp* | 100 bp | 1,000 bp | 10,000 bp | 100,000 bp | 1 million bp |
---|---|---|---|---|---|---|
prot_translate | 91 ns | 0.29 μs | 2.28 μs | 23 μs | 215 μs | 2.25 ms |
fnv hashmap | 111 ns | 0.37 μs | 3.58 μs | 37 μs | 366 us | 3.86 ms |
std hashmap | 160 ns | 1.03 μs | 9.65 μs | 100 μs | 943 μs | 9.40 ms |
phf_map | 177 ns | 1.04 μs | 9.47 μs | 100 μs | 936 μs | 9.91 |
match statement | 259 ns | 1.77 μs | 17.9 μs | 163 μs | 1941 μs | 19.1 ms |
prot_translate (unchecked) | 90 ns | 0.26 μs | 2.02 μs | 20 μs | 197 μs | 1.92 ms |
*bp = "base pairs"
To benchmark yourself (have to use nightly because of phf_map macro).
cargo +nightly bench
translate_unchecked
that did not validate each byte for valid ASCII, but since the performance gain was negligible, it was removed.To test
cargo test
To can also generate new test data (requires python3 and biopython).
# Generate 500 random sequences and their peptides
python3 tests/generate_test_data.py 500