
created_at2023-06-07 11:27:22.378986+00
updated_at2025-01-20 11:42:48.505559+00
descriptionA library to handle proteomic mass spectrometry data and match peptides to spectra.
Cargo.toml error:TOML parse error at line 31, column 1 | 31 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include`
owners (github:rusteomics:owners)



Match those fragments!

Handle mass spectrometry data in Rust. This crate is set up to handle very complex peptides with loads of ambiguity and complexity. It pivots around the [CompoundPeptidoform], [Peptidoform] and [LinearPeptide] which encode the ProForma specification. Additionally this crate enables the reading of mgf, doing spectrum annotation (BU/MD/TD), finding isobaric sequences, doing alignments of peptides , accessing the IMGT germline database, and reading identified peptide files.

Library features

  • Read ProForma sequences (complete specification supported: 'level 2-ProForma + top-down compliant + cross-linking compliant + glycans compliant + mass spectrum compliant')
  • Generate theoretical fragments with control over the fragmentation model from any ProForma peptidoform/proteoform
    • Generate theoretical fragments for chimeric spectra
    • Generate theoretical fragments for cross-links (also disulfides)
    • Generate theoretical fragments for modifications of unknown position
    • Generate peptide backbone (a, b, c, x, y, and z) and satellite ion fragments (w, d, and v)
    • Generate glycan fragments (B, Y, and internal fragments)
  • Integrated with mzdata for reading raw data files
  • Match spectra to the generated fragments
  • Align peptides based on mass
  • Fast access to the IMGT database of antibody germlines
  • Reading of multiple identified peptide file formats (Fasta, MaxQuant, MSFragger, Novor, OPair, Peaks, and Sage)
  • Exhaustively fuzz tested for reliability (using cargo-afl)
  • Extensive use of uom for compile time unit checking

Example usage

# fn main() -> Result<(), rustyms::error::CustomError> {
# let raw_file_path = "data/annotated_example.mgf";
use rustyms::{*, system::{usize::Charge, e}};
// Open example raw data (this is the built in mgf reader, look into mzdata for more advanced raw file readers)
let spectrum = rawfile::mgf::open(raw_file_path)?;
// Parse the given ProForma definition
let peptide = CompoundPeptidoformIon::pro_forma("[Gln->pyro-Glu]-QVQEVSERTHGGNFD", None)?;
// Generate theoretical fragments for this peptide given EThcD fragmentation
let model = Model::ethcd();
let fragments = peptide.generate_theoretical_fragments(Charge::new::<e>(2), &model);
// Annotate the raw data with the theoretical fragments
let annotated = spectrum[0].annotate(peptide, &fragments, &model, MassMode::Monoisotopic);
// Calculate a peak false discovery rate for this annotation 
let (fdr, _) = annotated.fdr(&fragments, &model, MassMode::Monoisotopic);
// This is the incorrect sequence for this spectrum so the peak FDR will indicate this
# dbg!(&fdr, fdr.peaks_sigma(), fdr.peaks_fdr(), fdr.peaks_score());
assert!(fdr.peaks_sigma() > 2.0);
# Ok(()) }
# fn main() -> Result<(), rustyms::error::CustomError> {
use rustyms::{*, align::*};
// Check how this peptide compares to a similar peptide (using the feature `align`)
let first_peptide = Peptidoform::pro_forma("IVQEVT", None)?.into_simple_linear().unwrap();
let second_peptide = Peptidoform::pro_forma("LVQVET", None)?.into_simple_linear().unwrap();
// Align the two peptides using mass based alignment
// ─  ╶╴
let alignment = align::<4, SimpleLinear, SimpleLinear>(
# dbg!(&alignment);
// Calculate some more statistics on this alignment
let stats = alignment.stats();
assert_eq!(stats.mass_similar, 6); // 6 out of the 6 positions are mass similar
# Ok(()) }

Compilation features

Rustyms ties together multiple smaller modules into one cohesive structure. It has multiple features which allow you to slim it down if needed (all are enabled by default).

  • align - gives access to mass based alignment of peptides.
  • identification - gives access to methods reading many different identified peptide formats.
  • imgt - enables access to the IMGT database of antibodies germline sequences, with annotations.
  • isotopes - gives access to generation of an averagine model for isotopes, also enables two additional dependencies.
  • rand - allows the generation of random peptides.
  • rayon - enables parallel iterators using rayon, mostly for imgt but also in consecutive align.
  • mzdata - enables integration with mzdata which has more advanced raw file support.
Commit count: 776

cargo fmt