Peptide LC-MS features in MasPy¶
Chromatographic separation coupled directly to the mass spectrometer results in analytes to appear over time as they emerge from the chromatography column in a more or less Gaussian peak shape. The elution profile of an analyte can be recapitulated by using the spectral information present in the MS1 scans. The simplest way to do this is by extracting the intensities of an ion species with a given m/z value in consecutive MS1 scans as long as the ion is detectable. Combining the extracted intensities with the respective MS1 retention times results in a so called extracted ion chromatogram (EIC or XIC). The intensity area obtained by integrating the XIC is frequently used as a measure of abundance in label free quantification (LFQ) and stable isotope labeling (SIL) workflows.
In MS spectra each peptide species consists of an isotope envelope of multiple ion species with different m/z values. Combining the XICs of the different isotope states of the same analyte allows the inference of its charge state and therefore its mass. In addition the availability of more information results in more accurate intensity area estimates and thus increased accuracy for quantification. The combined information of XICs and isotope clusters can be referred to as a peptide LC-MS feature or more commonly simply as a feature.
Representation of LC-MS features in MasPy¶
Peptide LC-MS features, but also XICs, are represented in MasPy with the feature
item class maspy.core.Fi
. Its structure is kept very simple, similar to
Si
and Sii
, with only a few
mandatory attributes. Each instance is uniquely identified by the combination of
the Fi.id
and Fi.specfile
attributes. However, the id
attribute is
not associated with any particular MS scan, which was the case for Si
and
Sii
. Further attributes that should always be supplied when importing
features into MasPy are mz
, rt
, rtLow
, rtHigh
and intensity
.
Altough not absolutely mandatory the charge
attribute should also be
supplied whenever possible, since the charge information is used in some
algorithms.
The FiContainer
is used to store feature items
of one or multiple specfiles. The container allows saving and loading of
imported results and provides methods for convenient access to the data.
Attribute naming conventions in MasPy and additional attributes that might be necessary for working with feature items:
#TODO: maybe change mz to obsMz to be consistent between data types
mz
the experimentally observed mass to charge ratio (Dalton / charge). Normally the m/z value of the monoisotopic peak.rt
the retention time center of the feature.rtLow
the lower retention time boundary of the feature.rtHigh
the upper retention time boundary of the feature.intensity
an estimator for the feature abundance. The preferred value is the integrated intensity area, but the feature apex intensity is also possible.charge
the charge state of the feature.peptide
the peptide sequence of theSii
that is used for annotating the feature.sequence
the plain amino acid sequence of theSii
that is used for annotating the feature.score
or any other score attribute name of theSii
that is used for annotating the feature. It describes the quality of a spectrum identifications.obsMz
the experimentally observed mass to charge ratio of the feature (Dalton / charge). Normally the m/z value of the monoisotopic peak.obsMh
the experimentally observed mass to charge ratio of the feature, calculated for the mono protonated ion (Dalton / charge). Normally the monoisotopic peak.obsMass
the experimentally observed not protonated mass of a feature calculated by using the mz and charge values (Dalton / charge). Normally the monoisotopic mass.excMz
the exact calculated mass to charge ratio of the peptide (Dalton / charge). Normally the monoisotopic ion.excMh
the exact calculated mass to charge ratio of the peptide, calculated for the mono protonated state (Dalton / charge). Normally the monoisotopic ion.excMass
the exact calculated mass of the not protonated peptide (Dalton / charge). Normally the monoisotopic mass.
MasPy internal feature item attributes:
isValid
can be used to flag if a Fi has passed a given quality threshold.isMatched
can be used to flag if a Fi has been matched to anySi
orSii
elements.isAnnotated
can be used to flag if a Fi has been annotated with aSii
element and therefore with an identified peptide sequence.siIds
a list ofSi
elements that have been matched to the feature item.siiIds
a list ofSii
elements that have been matched to the feature item.
Supported feature detection algorithms¶
Currently MasPy supports the import of two feature containing file types; the
openMS feature file format .featureXML
and the .feature.tsv
format
generated by the open source tool Dinosaur. However, adding import routines for
additional file formats should be trivial an can be done on demand.
The FeatureFinderCentroided node from openMS is one of the best established open source LC-MS feature defining algorithms. It can be used independently of a data analysis pipeline and other processing steps. It was published in 2013 as part of a complete openMS pipeline: An Automated Pipeline for High- Throughput Label-Free Quantitative Proteomics. Since its publication it was applied in numerous publications and has been reused in at least two additional open source projects: DeMix and DeMix-Q.
Dinosaur: A Refined Open- Source Peptide MS Feature Detector published in 2016, is an algorithm based on the graph model concept for feature detection introduced by MaxQuant in 2008. Dinosaur seems to provide similar or better results then the FeatureFinderCentroided node of openMS with a substantial increase in runtime performance. It is available on Github.
Basic code examples¶
Importing peptide features
The function maspy.reader.importPeptideFeatures()
is used to import LC-
MS features from a file. It automatically recognises the file type by the file
name extension and executes the respective import routine. Therefore the file
extension has to be either .featurexml
(openMS) or .feature.tsv
(Dinosaur) and is not case sensitive. The imported feature items are stored in
the FiContainer
instance passed to the function.
import maspy.core
import maspy.reader
fiContainer = maspy.core.FiContainer()
maspy.reader.importPeptideFeatures(fiContainer, 'filelocation/f.featureXML',
'specfile_name_1')
Matching spectrum identification items to feature items
The peptide underlying a LC-MS feature can be determined by using the
information of identified MSn scans. In MasPy this can be achieved by using
maspy.featuremethods.matchToFeatures()
, which allows matching Sii
to
Fi
elements by comparing their m/z, retention time and charge information.
User defined tolerance values for matching should be passed to the function, for
details see the docstring documentation. However, the default settings should be
appropriate for typical high resolution MS1 data as obtained by Thermo Orbitrap
instruments.
#TODO: describe the print output
>>> import maspy.featuremethods
>>> maspy.featuremethods.matchToFeatures(fiContainer, siiContainer,
>>> specfiles='specfile_name_1')
------ specfile_name_1 ------
Annotated features: 3802 / 20437 = 18.6 %
Spectra matched to features: 4240 / 4898 = 86.6 %
Note
#TODO: describe which attributes must be present in the Sii items and link to the tutorial that describes how to obtain these attributes. #charge, m/z, rentention time
Accessing data stored in a FiContainer
#TODO: describe .getItem(), .getArrays()