orphos-cli

Crates.ioorphos-cli
lib.rsorphos-cli
version0.1.0
created_at2025-11-07 22:10:38.27199+00
updated_at2025-11-07 22:10:38.27199+00
descriptionCommand-line interface for Orphos, a tool for finding protein-coding genes in microbial genomes.
homepage
repositoryhttps://github.com/FullHuman/orphos
max_upload_size
id1922246
size96,171
Floriel (Ffloriel)

documentation

https://docs.rs/orphos-cli

README

Orphos CLI

CI Coverage License: GPL v3 Crates.io

Command-line interface for Orphos, a fast, parallel Rust implementation of Prodigal for finding protein-coding genes in microbial genomes.

Features

  • 🚀 High Performance: Multi-threaded processing using Rayon
  • 💾 Memory Efficient: Optimized for large genomes and metagenomic assemblies
  • 🔄 Compatible: Output format compatible with original Prodigal
  • 🌍 Cross-Platform: Works on Linux, macOS, and Windows
  • 📊 Multiple Output Formats: GenBank, GFF3, SCO, and GCA formats
  • 🧬 Flexible Modes: Single genome and metagenomic analysis modes

Installation

Using Cargo

cargo install orphos-cli

From Source

git clone https://github.com/FullHuman/orphos.git
cd orphos
cargo install --path orphos-cli

Homebrew (macOS/Linux)

brew tap FullHuman/orphos
brew install orphos

Conda

conda install -c bioconda orphos

Quick Start

Basic Usage

# Analyze a genome and output GenBank format
orphos -i genome.fasta -o genes.gbk

# Analyze with GFF3 output
orphos -i genome.fasta -f gff -o genes.gff

# Metagenomic mode for short contigs
orphos -i metagenome.fasta -p meta -o genes.gff

# Complete circular genome (closed ends)
orphos -i plasmid.fasta -c -o plasmid.gbk

Reading from stdin/stdout

# Input from stdin
cat genome.fasta | orphos -o genes.gbk

# Output to stdout
orphos -i genome.fasta > genes.gbk

# Pipe both
cat genome.fasta | orphos > genes.gbk

Command-Line Options

Required/Input

Option Short Long Description
Input file -i --input Input FASTA file (default: stdin)
Output file -o --output Output file (default: stdout)

Output Options

Option Short Long Default Description
Format -f --format gbk Output format: gbk, gff, sco, gca

Analysis Options

Option Short Long Default Description
Mode -p --mode single Analysis mode: single or meta
Closed ends -c --closed false No genes off edges (for complete genomes)
Mask N's -m --mask false Mask runs of N's
Translation table -g --translation-table auto Translation table (1-25)
Training file -t --training - Use pre-trained parameters

Other Options

Option Short Long Description
Quiet -q --quiet Suppress progress messages
Help -h --help Display help information
Version -V --version Display version information

Output Formats

GenBank (gbk)

Rich annotation format with gene features, translations, and metadata.

orphos -i genome.fasta -f gbk -o genes.gbk

GFF3 (gff)

General Feature Format version 3, widely used in genomics pipelines.

orphos -i genome.fasta -f gff -o genes.gff

Simple Coordinate Output (sco)

Tab-delimited gene coordinates for easy parsing.

orphos -i genome.fasta -f sco -o genes.sco

Gene Coordinate Annotation (gca)

Compact coordinate format.

orphos -i genome.fasta -f gca -o genes.gca

Analysis Modes

Single Genome Mode (default)

Use for complete or near-complete genomes (>100kb). Orphos will train on the genome to optimize gene prediction accuracy.

orphos -i complete_genome.fasta -o genes.gbk

Best for:

  • Complete bacterial genomes
  • Complete archaeal genomes
  • Large contigs or chromosomes
  • Closed genomes

Metagenomic Mode

Use for short contigs or mixed metagenomic assemblies. Uses pre-trained parameters instead of training on the input.

orphos -i metagenome_contigs.fasta -p meta -o genes.gff

Best for:

  • Metagenomic assemblies
  • Short contigs (<100kb)
  • Mixed-species samples
  • Fragmented sequences

Advanced Examples

Complete Circular Genome

For complete circular genomes (chromosomes, plasmids), use the -c flag to prevent genes from being called off the edges:

orphos -i circular_plasmid.fasta -c -o plasmid.gbk

Custom Translation Table

Specify a custom genetic code (translation table):

# Use translation table 4 (Mycoplasma/Spiroplasma)
orphos -i mycoplasma.fasta -g 4 -o genes.gbk

# Use translation table 11 (Bacterial and Archaea)
orphos -i bacteria.fasta -g 11 -o genes.gbk

Masking Low-Quality Regions

Mask runs of N's in low-quality sequences:

orphos -i draft_assembly.fasta -m -o genes.gff

Batch Processing

Process multiple genomes:

for genome in genomes/*.fasta; do
    base=$(basename "$genome" .fasta)
    orphos -i "$genome" -f gff -o "results/${base}.gff"
done

Pipeline Integration

Integrate with other bioinformatics tools:

# Find genes and extract protein sequences
orphos -i genome.fasta -f gff -o genes.gff
# ... then use genes.gff with other tools

# Combine with annotation pipelines
orphos -i assembly.fasta -p meta -f gff -o genes.gff
prokka --proteins genes.gff --outdir annotation genome.fasta

Performance Tips

  1. Use multiple cores: Orphos automatically uses all available CPU cores via Rayon
  2. Metagenomic mode for many small contigs: Faster than single mode for fragmented assemblies
  3. Batch processing: Process multiple files in parallel using shell scripting
  4. Large files: Orphos handles multi-GB files efficiently

Translation Tables

Orphos supports NCBI translation tables 1-25 (excluding 7, 8, 17-20). Common tables:

Table Name Organisms
1 Standard Most eukaryotes
4 Mycoplasma/Spiroplasma Mycoplasma, Spiroplasma
11 Bacterial, Archaeal, Plant Plastid Most bacteria and archaea (default)
25 Candidate Division SR1, Gracilibacteria Certain bacteria

Related Projects

Contributing

We welcome contributions! Please see the main repository for contribution guidelines.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Citation

If you use Orphos in your research, please cite:

# TODO: Add citation information

Acknowledgments

This project is inspired by the original Prodigal by Doug Hyatt. We thank the authors for their groundbreaking work in prokaryotic gene prediction.

Support

Commit count: 0

cargo fmt