Crates.io | alpaca |
lib.rs | alpaca |
version | 0.1.0 |
source | src |
created_at | 2015-06-11 17:30:38.297169 |
updated_at | 2015-12-11 23:57:25.953017 |
description | ALPACA is a caller for genomic variants (single nucleotide and small indels) from next-generation sequencing data that uses a novel algebraic approach to incorporate sample based filtering into the calling. This allows to intuitively control the FDR for arbitrary filtering scenarios. |
homepage | https://github.com/johanneskoester/alpaca |
repository | |
max_upload_size | |
id | 2360 |
size | 814,013 |
ALPACA is a caller for genomic variants (single nucleotide and small indels) from next-generation sequencing data. It has two major distinguishing features compared to other variant callers:
Alpaca separates calling into three steps.
The separation allows to add samples later without having to redo all the computations. Since most of the work is done during preprocessing, the final calling becomes lightweight and can be repeated with different parameters within seconds. The algebraic query language allows to model calling scenarios in a flexible way, e.g.,
A complete description of algebraic variant calling can be found in my thesis
Köster, J. Parallelization, Scalability, and Reproducibility in Next-Generation Sequencing Analysis. PhD-Thesis, TU Dortmund, Germany 2014. ISBN: 978-3737537773.
If you use ALPACA, please cite the thesis for now.
All in one command:
$ alpaca preprocess --threads 8 A.bam B.bam C.bam | alpaca filter | alpaca call --fdr 0.05 'A - (B + C)' > calls.bcf
Separate preprocessing and merging (this allows to add samples or change queries without redundant computations; alpaca call usually needs a few seconds):
$ alpaca preprocess --threads 8 A.bam > A.bcf
$ alpaca preprocess --threads 8 B.bam > B.bcf
$ alpaca preprocess --threads 8 C.bam > C.bcf
$ alpaca merge --threads 8 A.bcf B.bcf C.bcf > all.bcf
$ alpaca call --threads 8 --fdr 0.05 'A - (B + C)' < all.bcf > calls.bcf