Crates.io | scattr |
lib.rs | scattr |
version | 0.2.1 |
source | src |
created_at | 2022-08-01 17:07:27.470804 |
updated_at | 2024-10-25 23:56:38.919458 |
description | A tool for estimating the copy number of large tandem repeats |
homepage | |
repository | https://github.com/rashidalabri/scattr |
max_upload_size | |
id | 636952 |
size | 3,512,984 |
ScatTR is a method for estimating the copy number of large tandem repeats (TRs) from paired-end short-read whole-genome sequencing (WGS) data.
Binaries for the latest version of ScatTR can be found in the releases page.
cargo
installed by following
this guide.cargo install scattr
Below are instructions on how to use the tool, along with descriptions of the available commands, arguments, and options.
Refer to the example directory for a comprehnsive overview of the workflow, explanations and formats of the input files. In general, to run the complete ScatTR workflow:
scattr output/sample stats input/sample.bam
scattr output/sample extract input/sample.bam input/catalog.tsv
scattr output/sample define input/catalog.tsv input/reference.fa
scattr output/sample genotype
Each command will produce output files with the given prefix. The commands must be run in the given order since each step will expect the outputs of the previous steps (with the same output prefix).
ScatTR is run with the following basic command structure:
scattr <OUTPUT_PREFIX> [COMMAND] [OPTIONS] <INPUTS>
<OUTPUT_PREFIX>
: Prefix for output files, including the directory where outputs will be saved.[COMMAND]
: Specifies the step of the workflow to perform (e.g., stats
, extract
, define
, genotype
).[OPTIONS]
: Optional parameters that modify the behavior of each command.stats
This command extracts the read depth and insert size distribution from an alignment file.
Usage:
scattr <OUTPUT_PREFIX> stats [OPTIONS]
<ALIGNMENT>
: Path to the alignment file (BAM or CRAM).-n, --num-regions <NUM_REGIONS>
: Number of regions to sample (default: 100).-l, --region-length <REGION_LENGTH>
: Length of each sampled region (default: 100000).-@ <THREADS>
: Number of HTSlib threads to use for reading the alignment file.--min-depth-mapq
, --min-insert-mapq
, --dc
, --ds
, etc., to customize depth and insert size filtering. Use --help
option to view details.extract
This command extracts the bag of reads, which are read-pairs that are likely to originate from the TR loci specified in the catalog.
Usage:
scattr <OUTPUT_PREFIX> extract [OPTIONS]
<ALIGNMENT>
: Path to the alignment file (SAM, BAM or CRAM).<CATALOG>
: Path to the tandem repeat catalog file (TSV format).-@ <THREADS>
: Number of HTSlib threads to use for reading the alignment file.-e
, --mapq-f
, --mapq-i
, etc., to customize read extraction behavior. Use --help
option to view details.define
This command generates definitions of the optimization problems needed to estimate the copy number of TR loci in the catalog. It requires the output of the extract
command.
Usage:
scattr <OUTPUT_PREFIX> define [OPTIONS] <CATALOG> <REFERENCE>
<CATALOG>
: Path to the TR catalog file (TSV format).<REFERENCE>
: Path to the reference genome (FASTA format). A index file .fa.fai
must also exist.--bag
, -n
, etc., to customize the problem definition process and input file paths. Use --help
option to view details.genotype
This command estimates the copy number for each TR locus. It required the output of the stats
and define
commands.
Usage:
scattr <OUTPUT_PREFIX> genotype [OPTIONS]
--hom
option to specify that TR copy number optimization process should assume that the TRs are homozygous. Otherwise, the assumption is that the TRs are heteroyzygous (with normal allele being the same as the reference)-n, --n-bootstraps
option to specify the number of bootstraps for estimating genotype confidence intervals (default: 10).--stats
, --defs
, etc., to customize TR copy number estimation parameters and input file paths.-t, --threads <THREADS>
: Number of ScatTR threads to use. Only useful for define
and genotype
commands to process loci in parallel.Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
See CONTRIBUTING.md.