| Crates.io | predictosaurus |
| lib.rs | predictosaurus |
| version | 0.6.0 |
| created_at | 2024-12-04 14:39:06.294853+00 |
| updated_at | 2025-09-07 21:59:05.93776+00 |
| description | Uncertainty aware haplotype based genomic variant effect prediction |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1472128 |
| size | 6,791,420 |
[!CAUTION] This tool is still experimental! This means it may have bugs, and features are subject to change. Use it cautiously, and share feedback to help us improve. 🧪
Predictosaurus is a command-line tool designed for uncertainty-aware haplotype-based genomic variant effect prediction. It provides comprehensive functionality for building variant graphs, processing genomic features, and extracting peptide sequences. The tool integrates various bioinformatics processes to support efficient data analysis and visualization.
To install Predictosaurus, you can install it via Bioconda:
conda install -c bioconda predictosaurus
Alternatively, you can use cargo, the Rust package manager:
cargo install predictosaurus
Run the tool from the command line using the following syntax:
predictosaurus <command> [options]
Use predictosaurus --help to view general help information, or predictosaurus <command> --help for specific command details.
All commands provide a verbose output with logging information via the --verbose/-v flag. Additionally the --threads/-t flag allows for multi-threaded execution. If not specified or 0, the number of threads is set to the number of available logical cores.
Builds a full variant graph from VCF files and stores it.
Options:
--calls <path>: Path to the VCF calls file.--observations <sample=observations.vcf>: One or more observation files; ensure sample names match those in the calls file.--min-prob-present <float>: Minimum probability for a variant to be considered for the graph generation. Defaults to 0.8.--min-vaf <float>: Minimum VAF for a variant to be considered in the graph. Needs to be higher for at least one of all given samples. Defaults to 0.05.--output <path>: Path to store the generated variant graphs.Example:
predictosaurus build --calls path/to/calls.vcf --observations sample1=path/to/observations1.vcf sample2=path/to/observations2.vcf --min-prob-present 0.65 --output path/to/output/graphs.duckdb
Retrieves subgraphs for individual features from the provided GFF file and calculates scores for all haplotypes of each transcript.
Options:
--features <path>: Path to the GFF file containing the features of interest.--reference <path>: Path to the reference genome FASTA file.--graph <path>: Path to the graph file generated by the build command.--haplotype-metric <string>: Metric to use for haplotype quantification. Valid values: product, geometric-mean, minimum. Defaults to geometric-mean.--output <path>: Path to the output file storing the calculated scores.Example:
predictosaurus process --features path/to/features.gff --reference path/to/reference.fasta --graph path/to/graph.duckdb --output path/to/output/scores.duckdb
Extracts peptide sequences using the graph generated by the build command. Peptides are extracted for each feature in the provided GFF file and written to a FASTA file.
Options:
--features <path>: Path to the GFF file containing the features of interest.--reference <path>: Path to the reference genome FASTA file.--graph <path>: Path to the graph file generated by the build command.--interval --sample <str>: Name of the sample to extract peptides for.--events <list of str>: List of events of interest.--min-event-prob <float>: The probability of a peptide is calculated by first summing the probabilities of all events provided via --events for each variant that covers the peptide, as well as for any upstream variant that causes a frameshift. These summed probabilities are then multiplied together to determine the final probability of the peptide. The --min-event-prob --background-events <list of str>: List of background events.--max-background-event-prob <float>: Maximum probability of background events. For probability calculations check the --min-event-prob option.--output <path>: Path to the output FASTA file storing the generated peptide sequences.Example:
predictosaurus peptides --features path/to/features.gff --reference path/to/reference.fasta --graph path/to/graph.duckdb --interval 8-11 --sample sample1 --events event1 --events event2 --min-event-prob 0.8 --background-events event3 --max-background-event-prob 0.2 --output path/to/output/peptides.fasta
Outputs the calculated scores in one TSV file with columns "transcript", "score", and per-sample likelihood columns (one column per sample).
Options:
--input <path>: Path to the input data file generated with the process command.--output <path>: Path to the output TSV file.Example:
predictosaurus plot --input path/to/scores.duckdb --output scores.tsv
This is an example of using Predictosaurus to build a graph, process it, and output the results:
# Step 1: Build the variant graph
predictosaurus build --calls calls.vcf --observations sample1=observations1.vcf sample2=observations2.vcf --output graphs.duckdb
# Step 2: Process the graph with a GFF file
predictosaurus process --features features.gff --reference reference.fasta --graph graphs.duckdb --output scores.duckdb
# Step 3: plot visualizations
predictosaurus plot --input scores.duckdb --output scores.tsv
Predictosaurus is licensed under the MIT License.