# HPO HPO, the [Human Phenotype Ontology](https://hpo.jax.org/app/) is a standard vocabulary of phenotypic abnormalities in human diseases. It is an Ontology, so all terms are connected to each other, similar to a directed graph. This library provides convenient APIs to work with the ontology. The main goals are to compare terms - or sets of terms - to each other and run statistics for enrichment analysis. This library is basically a Rust implementation of [PyHPO](https://pypi.org/project/pyhpo/), but contains some additional features as well. ### Features - πŸ‘« Identify patient cohorts based on clinical features - πŸ‘¨β€πŸ‘§β€πŸ‘¦ Cluster patients or other clinical information for GWAS - πŸ©»β†’πŸ§¬ Phenotype to Genotype studies - 🍎🍊 HPO similarity analysis - πŸ•ΈοΈ Graph based analysis of phenotypes, genes and diseases - πŸ”¬ Enrichment analysis of genes and diseases in sets of HPO terms - Completely written in Rust, so it's **πŸš€blazingly fastπŸš€**TM ([Benchmarks](#benchmarks)) ## What is the current state? The library is pretty much feature-complete, at least for my use-cases. If you have any feature-requests, please open an Issue or get in touch. I'm very much interested in getting feedback and new ideas what to improve. The API is mostly stable, but I might refactor some parts a bit for easier use and performance gain. If you find this project interesting and want to contribute, please get in touch, I could definitely need some help. ## Documentation The public API is fully documented on [`docs.rs`](https://docs.rs/hpo/latest/hpo/) The main structs used in `hpo` are: - The [`Ontology`](https://docs.rs/hpo/latest/hpo/struct.Ontology.html) is the main struct and entrypoint in `hpo`. - [`HpoTerm`](https://docs.rs/hpo/latest/hpo/term/struct.HpoTerm.html) represents a single HPO term and contains plenty of functionality around them. - [`HpoSet`](https://docs.rs/hpo/latest/hpo/struct.HpoSet.html) is a collection of `HpoTerm`s, like a patient's clinical information. - [`Gene`](https://docs.rs/hpo/latest/hpo/annotations/struct.Gene.html) represents a single gene, including information about associated `HpoTerm`s. - [`OmimDisease`](https://docs.rs/hpo/latest/hpo/annotations/struct.OmimDisease.html) represents a single OMIM-diseases, including information about associated `HpoTerm`s. - [`OrphaDisease`](https://docs.rs/hpo/latest/hpo/annotations/struct.OrphaDisease.html) represents a single ORPHA-diseases, including information about associated `HpoTerm`s. The most relevant modules are: - [`annotations`](https://docs.rs/hpo/latest/hpo/annotations/index.html) contains the `Gene`, `OmimDisease` and `OrphaDisease` structs, and some related important types. - [`similarity`](https://docs.rs/hpo/latest/hpo/similarity/index.html) contains structs and helper functions for similarity comparisons for `HpoTerm` and `HpoSet`. - [`stats`](https://docs.rs/hpo/latest/hpo/stats/index.html) contains functions to calculate the hypergeometric enrichment score of genes or diseases. ## Examples Some (more or less random) examples are included in the [`examples` folder](https://github.com/anergictcell/hpo/tree/main/examples). ### Ontology ```rust use hpo::{Ontology, HpoTermId}; use hpo::annotations::{GeneId, OmimDiseaseId, OrphaDiseaseId}; fn example() { let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap(); // iterate HPO terms for term in &ontology { // do something with term } // iterate Genes for gene in ontology.genes() { // do something with gene } // iterate omim diseases for disease in ontology.omim_diseases() { // do something with disease } // iterate orpha diseases for disease in ontology.orpha_diseases() { // do something with disease } // get a single HPO term using HPO ID let hpo_id = HpoTermId::try_from("HP:0000123").unwrap(); let term = ontology.hpo(hpo_id); // get a single HPO term using `u32` part of HPO ID let term = ontology.hpo(123u32); // get a single Omim disease let disease_id = OmimDiseaseId::from(12345u32); let disease = ontology.omim_disease(&disease_id); // get a single Orpha disease let disease_id = OrphaDiseaseId::from(12345u32); let disease = ontology.orpha_disease(&disease_id); // get a single Gene let hgnc_id = GeneId::from(12345u32); let gene = ontology.gene(&hgnc_id); // get a single Gene by its symbol let gene = ontology.gene_by_name("GBA"); } ``` ### HPO term ```rust use hpo::Ontology; fn example() { let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap(); let term = ontology.hpo(123u32).unwrap(); assert_eq!("Abnormality of the nervous system", term.name()); assert_eq!("HP:000123".to_string(), term.id().to_string()); // iterate all parents for p in term.parents() { println!("{}", p.name()) } // iterate all children for p in term.children() { println!("{}", p.name()) } let term2 = ontology.hpo(1u32).unwrap(); assert!(term2.parent_of(&term)); assert!(term.child_of(&term2)); } ``` ### Similarity ```rust use hpo::Ontology; use hpo::similarity::GraphIc; use hpo::term::InformationContentKind; fn example() { let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap(); let term1 = ontology.hpo(123u32).unwrap(); let term2 = ontology.hpo(1u32).unwrap(); let ic = GraphIc::new(InformationContentKind::Omim); let similarity = term1.similarity_score(&term2, &ic); } ``` ### Enrichment Identify which genes (or diseases) are enriched in a set of `HpoTerm`s, e.g. in the clinical information of a patient or patient cohort ```rust use hpo::Ontology; use hpo::{HpoSet, term::HpoGroup}; use hpo::stats::hypergeom::gene_enrichment; fn example() { let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap(); let mut hpos = HpoGroup::new(); hpos.insert(2943u32); hpos.insert(8458u32); hpos.insert(100884u32); hpos.insert(2944u32); hpos.insert(2751u32); let patient_ci = HpoSet::new(&ontology, hpos); let mut enrichments = gene_enrichment(&ontology, &patient_ci); // the results are not sorted by default enrichments.sort_by(|a, b| { a.pvalue().partial_cmp(&b.pvalue()).unwrap() }); for gene in enrichments { println!("{}\t{}\t({})", gene.id(), gene.pvalue(), gene.enrichment()); } } ``` ## Benchmarks As the saying goes: "Make it work, make it good, make it fast". The *work* and *good* parts are realized in [PyHPO](https://pypi.org/project/pyhpo/). And even though I tried my best to make it *fast*, I was still hungry for more. So I started developing the `hpo` Rust library in December 2022. Even without micro-benchmarking and tuning performance as much as I did for `PyHPO`, `hpo` is indeed much much faster already now. The below benchmarks were run non scientificially and your mileage may vary. I used a MacBook Air M1, `rustc 1.68.0`, `Python 3.9` and `/usr/bin/time` for timing. | Benchmark | `PyHPO` | `hpo` (single-threaded) | `hpo` (multi-threaded) | | --------- | ----- | --- | --- | | Read and Parse Ontology | 6.4 s | 0.22 s | 0.22 s | | Similarity of 17,245 x 1,000 terms | 98.5 s | 4.6 s | 1.0 s | | Similarity of GBA1 to all Diseases | 380 s | 15.8 s | 3.0 s | | Disease enrichment in all Genes | 11.8 s | 0.4 s | 0.3 s | | Common ancestors of 17,245 x 10,000 terms | 225.2 s | 10.5 | 2.1 | ## Technical design There is some info about the plans for the implementation in the [Technical Design document](https://github.com/anergictcell/hpo/blob/main/TechnicalDesign.md)