Crates.io | ambigviz |
lib.rs | ambigviz |
version | 0.1.0 |
source | src |
created_at | 2024-04-27 09:40:42.382969 |
updated_at | 2024-04-27 09:40:42.382969 |
description | Identify and plot ambiguous nucleotide bases at given positions from a BAM file. |
homepage | |
repository | https://github.com/Sam-Sims/ambigviz |
max_upload_size | |
id | 1222452 |
size | 251,046 |
ambigviz is a tool for rapidly scanning and visualising ambiguous/mixed bases at given positions in a BAM file. It was initially written to examine intrahost diversity / co-infection in SARS-CoV-2 samples however it can be used for any BAM file (and scales well thanks to Rust). It uses strict filtering options by default to avoid sequencing artifacts and contamination/sequencing errors, The idea is to rapidly produce plots that are "presentation-ready" for further communication and discussion.
It provides a simple command line interface and requires at minimum a BAM file only.
Requires cargo
cargo install ambigviz
To install please refer to the rust documentation: docs
git clone https://github.com/Sam-Sims/ambigviz
cd ambigviz
cargo build --release
export PATH=$PATH:$(pwd)/target/release
All executables will be in the directory ambigviz/target/release.
ambigviz ambig <path_to_bam> <region> [options]
At minimum all you need is a BAM file. Ambigviz will look for a .bai
file in the same directory with the
pattern [input].bai
- If one
cant be found ambigviz will attempt to index the BAM file for you. If this fails you can index the BAM file yourself
using
samtools.
By default, if no region is provided the entire BAM file will be scanned for ambiguous bases. Combined with the default threshold of 20% ambiguity, this provides an easy way to quickly scan a BAM file for sequencing errors, contaimination or co-infection and flag regions of interest for further investigation.
Regions follow the samtools format: chr:start-end
and all positions are 1-based.
The --bed
option allows you to output identified ambiguous positions to a bed file.
You can also plot the depth of a BAM file using the depth command.
ambigviz depth <path_to_bam> <region> [options]
-o, --output <output>
This option will set the output file name. Default is to output the chromasome name.
-t, --threshold <threshold>
This option will set the threshold for the proportion of ambiguous bases at a given position. Any position exceeding this treshold will be plotted. The maximum value is 0.5 (50%). Default is 0.2 (20%).
--no-indel
By default indels are included in the ambiguous base count. This option will exclude them. This is useful for nosiy data like ONT.
-D, --depth <depth>
This option will set the minimum total depth for a position to be included in the plot. Default is 100
-d, --minor-depth <minor-depth>
This option will set the minimum depth of the minor allele for a position to be included in the plot. Default is 20
-q, --base-quality <base-quality>
This option will set the minimum base quality for a base to be included in the plot. Default is 20
-Q, --mapping-quality <mapping-quality>
This option will set the minimum mapping quality for a read to be included in the plot. Default is 60
-s, --strand-bias <strand-bias>
This option will set the strand bias threshold for a position to be included in the plot. Default is 0.1 (10%). This means that at least 10% of the depth must come from either strand. For example if you had a position that had 80 reads of A and 20 reads of C, there must be at least 2 from each strand in C in order for the position to be flagged as ambiguous and included in the plot. The maximum value is 0.5, requiring that there is an equal amount of depth from each strand. Setting to 0 will disable this filter.
--bed
This option will output the identified ambiguous positions to a bed file. The default is to not output.
--no-labels
By default the proportion of each base for each position are included as text annotations. This option will exclude them.