Crates.io | bigsig |
lib.rs | bigsig |
version | 0.1.0 |
source | src |
created_at | 2024-08-30 23:49:09.106214 |
updated_at | 2024-08-30 23:49:09.106214 |
description | Large-scale Sequence Search with BItsliced Genomic Signature Index (BIGSIG) |
homepage | |
repository | |
max_upload_size | |
id | 1358440 |
size | 196,929 |
This is a port of crate colorid with several updates for real-world application;
Credit for orginal implementation to original authors.
git clone https://gitlab.com/Jianshu_Zhao/bigsig
cd bigsig
cargo build --release
************** initializing logger *****************
bigsig 0.1.0
Large-scale Sequence Search with BItsliced Genomic Signature Index (BIGSIG)
USAGE:
bigsig [SUBCOMMAND]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
SUBCOMMANDS:
batch_identify Identify batch of samples reads
construct Construct a BIGSIG
filter filters reads
help Prints this message or the help of the given subcommand(s)
identify identify reads based on probability
query query a bigsig on one or more fasta/fastq.gz files
show show index parameters
An example to build and query BigSig database
bigsig construct -r ref_file_example.txt -b test -k 31 -mv 21 -s 10000000 -n 4 -t 24
bigsig query -b ./test.mxi -q ./test_data/test.fastq.gz
bigsig identify -b test.mxi -q ./test_data/test.fastq.gz -n output -t 24 --high_mem_load
With the default settings BigSiq will report reference sequences that share >35% of their k-mers with the query. Here is the output of a query with SRA accession SRR4098796 (L. monocytogenes lineage I) as query:
SRR4098796_1.fastq.gz 3076072 Listeria_monocytogenes_F2365 0.87 134.25 126 475266
SRR4098796_1.fastq.gz 3076072 Listeria_monocytogenes_SRR2167842 0.40 128.25 122 7831
In the first column we find the query, the second column shows the number of k-mers in the query, the third column displays the reference sequence, the fourth column the proportion of kmers in the reference shared with the query, the fifth column displays the average coverage based on k-mers that were uniquely matched with this reference, the sixth the modus of the coverage based on uniquely matched k-mers and the last column the number of uniquely matched k-mers.