Peptide LC-MS features in MasPy¶
Chromatographic separation coupled directly to the mass spectrometer results in analytes to appear over time as they emerge from the chromatography column in a more or less Gaussian peak shape. The elution profile of an analyte can be recapitulated by using the spectral information present in the MS1 scans. The simplest way to do this is by extracting the intensities of an ion species with a given m/z value in consecutive MS1 scans as long as the ion is detectable. Combining the extracted intensities with the respective MS1 retention times results in a so called extracted ion chromatogram (EIC or XIC). The intensity area obtained by integrating the XIC is frequently used as a measure of abundance in label free quantification (LFQ) and stable isotope labeling (SIL) workflows.
In MS spectra each peptide species consists of an isotope envelope of multiple ion species with different m/z values. Combining the XICs of the different isotope states of the same analyte allows the inference of its charge state and therefore its mass. In addition the availability of more information results in more accurate intensity area estimates and thus increased accuracy for quantification. The combined information of XICs and isotope clusters can be referred to as a peptide LC-MS feature or more commonly simply as a feature.
Representation of LC-MS features in MasPy¶
Peptide LC-MS features, but also XICs, are represented in MasPy with the feature
maspy.core.Fi. Its structure is kept very simple, similar to
Sii, with only a few
mandatory attributes. Each instance is uniquely identified by the combination of
Fi.specfile attributes. However, the
id attribute is
not associated with any particular MS scan, which was the case for
Sii. Further attributes that should always be supplied when importing
features into MasPy are
Altough not absolutely mandatory the
charge attribute should also be
supplied whenever possible, since the charge information is used in some
FiContainer is used to store feature items
of one or multiple specfiles. The container allows saving and loading of
imported results and provides methods for convenient access to the data.
Attribute naming conventions in MasPy and additional attributes that might be necessary for working with feature items:
#TODO: maybe change mz to obsMz to be consistent between data types
mzthe experimentally observed mass to charge ratio (Dalton / charge). Normally the m/z value of the monoisotopic peak.
rtthe retention time center of the feature.
rtLowthe lower retention time boundary of the feature.
rtHighthe upper retention time boundary of the feature.
intensityan estimator for the feature abundance. The preferred value is the integrated intensity area, but the feature apex intensity is also possible.
chargethe charge state of the feature.
peptidethe peptide sequence of the
Siithat is used for annotating the feature.
sequencethe plain amino acid sequence of the
Siithat is used for annotating the feature.
scoreor any other score attribute name of the
Siithat is used for annotating the feature. It describes the quality of a spectrum identifications.
obsMzthe experimentally observed mass to charge ratio of the feature (Dalton / charge). Normally the m/z value of the monoisotopic peak.
obsMhthe experimentally observed mass to charge ratio of the feature, calculated for the mono protonated ion (Dalton / charge). Normally the monoisotopic peak.
obsMassthe experimentally observed not protonated mass of a feature calculated by using the mz and charge values (Dalton / charge). Normally the monoisotopic mass.
excMzthe exact calculated mass to charge ratio of the peptide (Dalton / charge). Normally the monoisotopic ion.
excMhthe exact calculated mass to charge ratio of the peptide, calculated for the mono protonated state (Dalton / charge). Normally the monoisotopic ion.
excMassthe exact calculated mass of the not protonated peptide (Dalton / charge). Normally the monoisotopic mass.
MasPy internal feature item attributes:
isValidcan be used to flag if a Fi has passed a given quality threshold.
isMatchedcan be used to flag if a Fi has been matched to any
isAnnotatedcan be used to flag if a Fi has been annotated with a
Siielement and therefore with an identified peptide sequence.
siIdsa list of
Sielements that have been matched to the feature item.
siiIdsa list of
Siielements that have been matched to the feature item.
Supported feature detection algorithms¶
Currently MasPy supports the import of two feature containing file types; the
openMS feature file format
.featureXML and the
generated by the open source tool Dinosaur. However, adding import routines for
additional file formats should be trivial an can be done on demand.
The FeatureFinderCentroided node from openMS is one of the best established open source LC-MS feature defining algorithms. It can be used independently of a data analysis pipeline and other processing steps. It was published in 2013 as part of a complete openMS pipeline: An Automated Pipeline for High- Throughput Label-Free Quantitative Proteomics. Since its publication it was applied in numerous publications and has been reused in at least two additional open source projects: DeMix and DeMix-Q.
Dinosaur: A Refined Open- Source Peptide MS Feature Detector published in 2016, is an algorithm based on the graph model concept for feature detection introduced by MaxQuant in 2008. Dinosaur seems to provide similar or better results then the FeatureFinderCentroided node of openMS with a substantial increase in runtime performance. It is available on Github.
Basic code examples¶
Importing peptide features
maspy.reader.importPeptideFeatures() is used to import LC-
MS features from a file. It automatically recognises the file type by the file
name extension and executes the respective import routine. Therefore the file
extension has to be either
.featurexml (openMS) or
(Dinosaur) and is not case sensitive. The imported feature items are stored in
FiContainer instance passed to the function.
import maspy.core import maspy.reader fiContainer = maspy.core.FiContainer() maspy.reader.importPeptideFeatures(fiContainer, 'filelocation/f.featureXML', 'specfile_name_1')
Matching spectrum identification items to feature items
The peptide underlying a LC-MS feature can be determined by using the
information of identified MSn scans. In MasPy this can be achieved by using
maspy.featuremethods.matchToFeatures(), which allows matching
Fi elements by comparing their m/z, retention time and charge information.
User defined tolerance values for matching should be passed to the function, for
details see the docstring documentation. However, the default settings should be
appropriate for typical high resolution MS1 data as obtained by Thermo Orbitrap
#TODO: describe the print output
>>> import maspy.featuremethods >>> maspy.featuremethods.matchToFeatures(fiContainer, siiContainer, >>> specfiles='specfile_name_1') ------ specfile_name_1 ------ Annotated features: 3802 / 20437 = 18.6 % Spectra matched to features: 4240 / 4898 = 86.6 %
#TODO: describe which attributes must be present in the Sii items and link to the tutorial that describes how to obtain these attributes. #charge, m/z, rentention time
Accessing data stored in a FiContainer
#TODO: describe .getItem(), .getArrays()