| Crates.io | orphos-core |
| lib.rs | orphos-core |
| version | 0.1.0 |
| created_at | 2025-11-07 17:55:53.709331+00 |
| updated_at | 2025-11-07 17:55:53.709331+00 |
| description | Core library for Orphos, a tool for finding protein-coding genes in microbial genomes. |
| homepage | |
| repository | https://github.com/FullHuman/orphos |
| max_upload_size | |
| id | 1921947 |
| size | 3,128,991 |
Core library for Orphos, a high-performance Rust implementation of Prodigal (prokaryotic gene prediction algorithms). This crate provides the foundational gene-finding capabilities for identifying protein-coding genes in microbial genomes.
orphos-core implements an unsupervised machine learning approach for finding genes in prokaryotic genomes. It uses dynamic programming and statistical models trained on genomic features to predict gene locations with high accuracy.
Add to your Cargo.toml:
[dependencies]
orphos-core = "0.1.0"
use orphos_core::{OrphosAnalyzer, config::OrphosConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create analyzer with default configuration
let mut analyzer = OrphosAnalyzer::new(OrphosConfig::default());
// Analyze a FASTA file
let results = analyzer.analyze_file("genome.fasta")?;
println!("Found {} genes", results.genes.len());
println!("{}", results.output);
Ok(())
}
use orphos_core::{OrphosAnalyzer, config::OrphosConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut analyzer = OrphosAnalyzer::new(OrphosConfig::default());
let sequence = "ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA...";
let results = analyzer.analyze_sequence(sequence, Some("MyGenome".to_string()))?;
for gene in &results.genes {
println!("Gene at {}..{} on {} strand",
gene.start, gene.end,
if gene.strand == bio::bio_types::strand::Strand::Forward { "+" } else { "-" }
);
}
Ok(())
}
use orphos_core::config::{OrphosConfig, OutputFormat};
let config = OrphosConfig {
closed_ends: true, // Complete genome (circular)
mask_n_runs: true, // Mask stretches of N's
output_format: OutputFormat::Gff,
num_threads: Some(4), // Use 4 threads
..Default::default()
};
let mut analyzer = OrphosAnalyzer::new(config);
For analyzing short contigs or mixed community samples:
use orphos_core::config::OrphosConfig;
let config = OrphosConfig {
metagenomic: true,
..Default::default()
};
let mut analyzer = OrphosAnalyzer::new(config);
let results = analyzer.analyze_file("metagenome.fasta")?;
config: Configuration options and output format settingsengine: Main analysis engine with training and prediction logictypes: Core data structures (Gene, Training, error types)results: Gene prediction results and sequence informationsequence: Sequence encoding, I/O, and processing utilitiesalgorithms: Gene-finding algorithms including:
node: Gene node management, creation, and scoringtraining: Training algorithms for Shine-Dalgarno and non-SD modelsoutput: Output formatters for GenBank, GFF, GCA, and SCObitmap: Efficient sequence encoding utilitiesmetagenomic: Metagenomic mode presets and models.gbk)Rich annotation format with full feature information:
LOCUS MyGenome 4641652 bp DNA linear BCT
FEATURES Location/Qualifiers
CDS 190..255
/gene="1"
/protein_id="MyGenome_1"
/translation="MTKRSAAAAAAVAAGMTSA"
.gff)Standard genome annotation format:
##gff-version 3
MyGenome Orphos CDS 190 255 . + 0 ID=MyGenome_1;
.gca)Tab-delimited gene coordinate annotation.
.sco)Simple coordinate output with minimal information.
| Option | Type | Default | Description |
|---|---|---|---|
metagenomic |
bool |
false |
Enable metagenomic mode for fragments |
closed_ends |
bool |
false |
Treat sequences as complete genomes |
mask_n_runs |
bool |
false |
Mask runs of N characters |
force_non_sd |
bool |
false |
Disable Shine-Dalgarno detection |
quiet |
bool |
false |
Suppress informational output |
output_format |
OutputFormat |
Genbank |
Output format selection |
translation_table |
Option<u8> |
None |
NCBI genetic code table (1-25) |
num_threads |
Option<usize> |
None |
Number of parallel threads |
All operations return Result<T, OrphosError> with detailed error types:
use orphos_core::types::OrphosError;
match analyzer.analyze_file("genome.fasta") {
Ok(results) => println!("Success: {} genes", results.genes.len()),
Err(OrphosError::SequenceTooShort { length, min }) => {
eprintln!("Sequence too short: {} bp (minimum: {} bp)", length, min);
}
Err(OrphosError::IoError(e)) => {
eprintln!("I/O error: {}", e);
}
Err(e) => eprintln!("Error: {}", e),
}
Contributions are welcome! Please see the main Orphos repository for contribution guidelines.
This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.
If you use Orphos in your research, please cite:
@software{orphos,
title = {Orphos: High-Performance Prokaryotic Gene Prediction},
author = {Floriel Fedry},
year = {2025},
url = {https://github.com/FullHuman/orphos}
}
This implementation is based on Prodigal, originally developed by Doug Hyatt. Orphos provides a modern, type-safe Rust implementation while maintaining compatibility with the original algorithms.