Crates.io | hpo |
lib.rs | hpo |
version | 0.11.0 |
source | src |
created_at | 2022-12-10 19:24:04.43298 |
updated_at | 2024-09-09 19:59:58.886042 |
description | Human Phenotype Ontology Similarity |
homepage | https://github.com/anergictcell/hpo |
repository | https://github.com/anergictcell/hpo |
max_upload_size | |
id | 733986 |
size | 5,557,919 |
HPO, the Human Phenotype Ontology is a standard vocabulary of phenotypic abnormalities in human diseases. It is an Ontology, so all terms are connected to each other, similar to a directed graph.
This library provides convenient APIs to work with the ontology. The main goals are to compare terms - or sets of terms - to each other and run statistics for enrichment analysis.
This library is basically a Rust implementation of PyHPO, but contains some additional features as well.
π« Identify patient cohorts based on clinical features
π¨βπ§βπ¦ Cluster patients or other clinical information for GWAS
π©»β𧬠Phenotype to Genotype studies
ππ HPO similarity analysis
πΈοΈ Graph based analysis of phenotypes, genes and diseases
π¬ Enrichment analysis of genes and diseases in sets of HPO terms
Completely written in Rust, so it's πblazingly fastπTM (Benchmarks)
The library is pretty much feature-complete, at least for my use-cases. If you have any feature-requests, please open an Issue or get in touch. I'm very much interested in getting feedback and new ideas what to improve.
The API is mostly stable, but I might refactor some parts a bit for easier use and performance gain.
If you find this project interesting and want to contribute, please get in touch, I could definitely need some help.
The public API is fully documented on docs.rs
The main structs used in hpo
are:
Ontology
is the main struct and entrypoint in hpo
.HpoTerm
represents a single HPO term and contains plenty of functionality around them.HpoSet
is a collection of HpoTerm
s, like a patient's clinical information.Gene
represents a single gene, including information about associated HpoTerm
s.OmimDisease
represents a single OMIM-diseases, including information about associated HpoTerm
s.OrphaDisease
represents a single ORPHA-diseases, including information about associated HpoTerm
s.The most relevant modules are:
annotations
contains the Gene
, OmimDisease
and OrphaDisease
structs, and some related important types.
similarity
contains structs and helper functions for similarity comparisons for HpoTerm
and HpoSet
.
stats
contains functions to calculate the hypergeometric enrichment score of genes or diseases.
Some (more or less random) examples are included in the examples
folder.
use hpo::{Ontology, HpoTermId};
use hpo::annotations::{GeneId, OmimDiseaseId, OrphaDiseaseId};
fn example() {
let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
// iterate HPO terms
for term in &ontology {
// do something with term
}
// iterate Genes
for gene in ontology.genes() {
// do something with gene
}
// iterate omim diseases
for disease in ontology.omim_diseases() {
// do something with disease
}
// iterate orpha diseases
for disease in ontology.orpha_diseases() {
// do something with disease
}
// get a single HPO term using HPO ID
let hpo_id = HpoTermId::try_from("HP:0000123").unwrap();
let term = ontology.hpo(hpo_id);
// get a single HPO term using `u32` part of HPO ID
let term = ontology.hpo(123u32);
// get a single Omim disease
let disease_id = OmimDiseaseId::from(12345u32);
let disease = ontology.omim_disease(&disease_id);
// get a single Orpha disease
let disease_id = OrphaDiseaseId::from(12345u32);
let disease = ontology.orpha_disease(&disease_id);
// get a single Gene
let hgnc_id = GeneId::from(12345u32);
let gene = ontology.gene(&hgnc_id);
// get a single Gene by its symbol
let gene = ontology.gene_by_name("GBA");
}
use hpo::Ontology;
fn example() {
let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
let term = ontology.hpo(123u32).unwrap();
assert_eq!("Abnormality of the nervous system", term.name());
assert_eq!("HP:000123".to_string(), term.id().to_string());
// iterate all parents
for p in term.parents() {
println!("{}", p.name())
}
// iterate all children
for p in term.children() {
println!("{}", p.name())
}
let term2 = ontology.hpo(1u32).unwrap();
assert!(term2.parent_of(&term));
assert!(term.child_of(&term2));
}
use hpo::Ontology;
use hpo::similarity::GraphIc;
use hpo::term::InformationContentKind;
fn example() {
let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
let term1 = ontology.hpo(123u32).unwrap();
let term2 = ontology.hpo(1u32).unwrap();
let ic = GraphIc::new(InformationContentKind::Omim);
let similarity = term1.similarity_score(&term2, &ic);
}
Identify which genes (or diseases) are enriched in a set of HpoTerm
s, e.g. in
the clinical information of a patient or patient cohort
use hpo::Ontology;
use hpo::{HpoSet, term::HpoGroup};
use hpo::stats::hypergeom::gene_enrichment;
fn example() {
let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
let mut hpos = HpoGroup::new();
hpos.insert(2943u32);
hpos.insert(8458u32);
hpos.insert(100884u32);
hpos.insert(2944u32);
hpos.insert(2751u32);
let patient_ci = HpoSet::new(&ontology, hpos);
let mut enrichments = gene_enrichment(&ontology, &patient_ci);
// the results are not sorted by default
enrichments.sort_by(|a, b| {
a.pvalue().partial_cmp(&b.pvalue()).unwrap()
});
for gene in enrichments {
println!("{}\t{}\t({})", gene.id(), gene.pvalue(), gene.enrichment());
}
}
As the saying goes: "Make it work, make it good, make it fast". The work and good parts are realized in PyHPO. And even though I tried my best to make it fast, I was still hungry for more. So I started developing the hpo
Rust library in December 2022. Even without micro-benchmarking and tuning performance as much as I did for PyHPO
, hpo
is indeed much much faster already now.
The below benchmarks were run non scientificially and your mileage may vary. I used a MacBook Air M1, rustc 1.68.0
, Python 3.9
and /usr/bin/time
for timing.
Benchmark | PyHPO |
hpo (single-threaded) |
hpo (multi-threaded) |
---|---|---|---|
Read and Parse Ontology | 6.4 s | 0.22 s | 0.22 s |
Similarity of 17,245 x 1,000 terms | 98.5 s | 4.6 s | 1.0 s |
Similarity of GBA1 to all Diseases | 380 s | 15.8 s | 3.0 s |
Disease enrichment in all Genes | 11.8 s | 0.4 s | 0.3 s |
Common ancestors of 17,245 x 10,000 terms | 225.2 s | 10.5 | 2.1 |
There is some info about the plans for the implementation in the Technical Design document