maspy package¶
Submodules¶
maspy.auxiliary module¶
A collection of helper functions used in different modules. Functions deal for example with saving files, encoding and decoding data in the json format, filtering of numpy arrays, data fitting, etc.
-
class
maspy.auxiliary.DataFit(dependentVarInput, independentVarInput)[source]¶ Bases:
object#TODO: docstring
Parameters: - splines – #TODO: docstring
- splineCycles – #TODO: docstring
- splineMinKnotPoins – #TODO: docstring
- splineOrder – #TODO: docstring
- splineInitialKnots – #TODO: docstring
- splineSubsetPercentage – #TODO: docstring
- splineTerminalExpansion – #TODO: docstring
- dependentVar – #TODO: docstring
- independentVar – #TODO: docstring
-
class
maspy.auxiliary.MaspyJsonEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶ Bases:
json.encoder.JSONEncoderExtension of the json.JSONEncoder to serialize MasPy classes.
Maspy classes need to define a _reprJSON() method, which returns a json serializable object.
-
class
maspy.auxiliary.Memoize(function)[source]¶ Bases:
objectA general memoization class, specify a function when creating a new instance of the class. The functions return value is returned and stored in
self.memowhen the instance is called with an argument for the first time. Later calls with the same argument return the cached value, instead of calling the function again.Variables: - function – when
Memoizeis called this functions return value is returned. - memo – a dictionary that records the
functionreturn values of already called variables.
- function – when
-
class
maspy.auxiliary.PartiallySafeReplace[source]¶ Bases:
objectIndirectly overwrite files by writing to temporary files and replacing them at once.
This is a context manager. When the context is entered, subsequently opened files will actually open temporary files. Each time the same file-path is opened, the same temporary file will be used.
When the context is closed, it will attempt to replace the original files with the content of the temporary files. Thus, several files can be prepared with less risk of data loss. Data loss is still possible if the replacement- operation fails (due to locking, not handled yet) or is interrupted.
-
maspy.auxiliary.applyArrayFilters(array, posL, posR, matchMask)[source]¶ #TODO: docstring
Parameters: - array – #TODO: docstring
- posL – #TODO: docstring
- posR – #TODO: docstring
- matchMask – #TODO: docstring
Returns: numpy.array, a subset of the inputarray.
-
maspy.auxiliary.averagingData(array, windowSize=None, averagingType='median')[source]¶ #TODO: docstring
Parameters: - array – #TODO: docstring
- windowSize – #TODO: docstring
- averagingType – “median” or “mean”
Returns: #TODO: docstring
-
maspy.auxiliary.calcDeviationLimits(value, tolerance, mode)[source]¶ Returns the upper and lower deviation limits for a value and a given tolerance, either as relative or a absolute difference.
Parameters: - value – can be a single value or a list of values if a list of values is given, the minimal value will be used to calculate the lower limit and the maximum value to calculate the upper limit
- tolerance – a number used to calculate the limits
- mode – either
absoluteorrelative, specifies how thetoleranceshould be applied to thevalue.
-
maspy.auxiliary.factorial= <maspy.auxiliary.Memoize object>¶ Returns the factorial of a number, the results of already calculated numbers are stored in factorial.memo
-
maspy.auxiliary.findAllSubstrings(string, substring)[source]¶ Returns a list of all substring starting positions in string or an empty list if substring is not present in string.
Parameters: - string – a template string
- substring – a string, which is looked for in the
stringparameter.
Returns: a list of substring starting positions in the template string
-
maspy.auxiliary.joinpath(path, *paths)[source]¶ Join two or more pathname components, inserting “/” as needed and replacing all “” by “/”.
Returns: str
-
maspy.auxiliary.listFiletypes(targetfilename, directory)[source]¶ Looks for all occurences of a specified filename in a directory and returns a list of all present file extensions of this filename.
In this cas everything after the first dot is considered to be the file extension:
"filename.txt" -> "txt","filename.txt.zip" -> "txt.zip"Parameters: - targetfilename – a filename without any extensions
- directory – only files present in this directory are compared to the targetfilename
Returns: a list of file extensions (str)
-
maspy.auxiliary.loadBinaryItemContainer(zippedfile, jsonHook)[source]¶ Imports binaryItems from a zipfile generated by
writeBinaryItemContainer().Parameters: - zipfile – can be either a path to a file (a string) or a file-like object
- jsonHook – a custom decoding function for JSON formated strings of the binaryItems stored in the zipfile.
Returns: a dictionary containing binaryItems
{binaryItem.id: binaryItem, ... }
-
maspy.auxiliary.log10factorial= <maspy.auxiliary.Memoize object>¶ Returns the log10 factorial of a number, the results of already calculated numbers are stored in log10factorial.memo
-
maspy.auxiliary.matchingFilePaths(targetfilename, directory, targetFileExtension=None, selector=None)[source]¶ Search for files in all subfolders of specified directory, return filepaths of all matching instances.
Parameters: - targetfilename – filename to search for, only the string before the last ”.” is used for filename matching. Ignored if a selector function is specified.
- directory – search directory, including all subdirectories
- targetFileExtension – string after the last ”.” in the filename, has to be identical if specified. ”.” in targetFileExtension are ignored, thus ”.txt” is treated equal to “txt”.
- selector – a function which is called with the value of targetfilename and has to return True (include value) or False (discard value). If no selector is specified, equality to targetfilename is used.
Returns: list of matching file paths (str)
-
maspy.auxiliary.openSafeReplace(filepath, mode='w+b')[source]¶ Context manager to open a temporary file and replace the original file on closing.
-
maspy.auxiliary.returnArrayFilters(arr1, arr2, limitsArr1, limitsArr2)[source]¶ #TODO: docstring
Parameters: - arr1 – #TODO: docstring
- arr2 – #TODO: docstring
- limitsArr1 – #TODO: docstring
- limitsArr2 – #TODO: docstring
Returns: #TODO: docstring
-
maspy.auxiliary.returnSplineList(dependentVar, independentVar, subsetPercentage=0.4, cycles=10, minKnotPoints=10, initialKnots=200, splineOrder=2, terminalExpansion=0.1)[source]¶ #TODO: docstring
Note: Expects sorted arrays.
Parameters: - dependentVar – #TODO: docstring
- independentVar – #TODO: docstring
- subsetPercentage – #TODO: docstring
- cycles – #TODO: docstring
- minKnotPoints – #TODO: docstring
- initialKnots – #TODO: docstring
- splineOrder – #TODO: docstring
- terminalExpansion – expand subsets on both sides
Returns: #TODO: docstring
-
maspy.auxiliary.searchFileLocation(targetFileName, targetFileExtension, rootDirectory, recursive=True)[source]¶ Search for a filename with a specified file extension in all subfolders of specified rootDirectory, returns first matching instance.
Parameters: - targetFileName (str) – #TODO: docstring
- rootDirectory (str) – #TODO: docstring
- targetFileExtension (str) – #TODO: docstring
- recursive – bool, specify whether subdirectories should be searched
Returns: a filepath (str) or None
-
maspy.auxiliary.toList(variable, types=(<class 'str'>, <class 'int'>, <class 'float'>))[source]¶ Converts a variable of type string, int, float to a list, containing the variable as the only element.
Parameters: variable ((str, int, float, others)) – any python object Returns: [variable] or variable
-
maspy.auxiliary.tolerantArrayMatching(referenceArray, matchArray, matchTolerance, matchUnit)[source]¶ #TODO: docstring Note: arrays must be sorted
Parameters: - referenceArray – #TODO: docstring
- matchArray – #TODO: docstring
- matchTolerance – #TODO: docstring
- matchUnit – #TODO: docstring
Returns: #TODO: docstring
#TODO: change matchUnit to “absolute”, “relative” and remove the “*1e-6”
-
maspy.auxiliary.writeBinaryItemContainer(filelike, binaryItemContainer, compress=True)[source]¶ Serializes the binaryItems contained in binaryItemContainer and writes them into a zipfile archive.
Examples of binaryItem classes are
maspy.core.Ciandmaspy.core.Sai. A binaryItem class has to define the function_reprJSON()which returns a JSON formated string representation of the class instance. In addition it has to contain an attribute.arrays, a dictionary which values arenumpy.array, that are serialized to bytes and written to thebinarydatafile of the zip archive. See_dumpArrayDictToFile()The JSON formated string representation of the binaryItems, together with the metadata, necessary to restore serialized numpy arrays, is written to the
metadatafile of the archive in this form:[[serialized binaryItem, [metadata of a numpy array, ...]], ...]Use the method
loadBinaryItemContainer()to restore a binaryItemContainer from a zipfile.Parameters: - filelike – path to a file (str) or a file-like object
- binaryItemContainer – a dictionary containing binaryItems
- compress – bool, True to use zip file compression
-
maspy.auxiliary.writeJsonZipfile(filelike, data, compress=True, mode='w', name='data')[source]¶ Serializes the objects contained in data to a JSON formated string and writes it to a zipfile.
Parameters: - filelike – path to a file (str) or a file-like object
- data – object that should be converted to a JSON formated string.
Objects and types in data must be supported by the json.JSONEncoder or
have the method
._reprJSON()defined. - compress – bool, True to use zip file compression
- mode – ‘w’ to truncate and write a new file, or ‘a’ to append to an existing file
- name – the file name that will be given to the JSON output in the archive
maspy.constants module¶
Contains frequently used variables and constants as for example the exact masses of atoms and amino acids or cleavage rules of the most common proteolytic enzymes.
-
maspy.constants.COMPOSITION= <MagicMock name='mock.mass.Composition' id='140340418951768'>¶ A Composition object stores a chemical composition of a substance. Basically, it is a dict object, with the names of chemical elements as keys and values equal to an integer number of atoms of the corresponding element in a substance.
The main improvement over dict is that Composition objects allow adding and subtraction. For details see
pyteomics.mass.Composition.
-
maspy.constants.aaComp= {'K': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'W': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'I': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'T': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'H': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'M': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'V': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'D': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'L': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'Y': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'G': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'R': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'N': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'S': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'C': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'A': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'P': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'E': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'F': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'Q': <MagicMock name='mock.mass.Composition()' id='140340419262840'>}¶ A dictionary with elemental compositions of the twenty standard amino acid residues. This concept was inherited from
pyteomics.mass.std_aa_comp.
-
maspy.constants.aaMass= {}¶ A dictionary with exact monoisotopic masses of the twenty standard amino acid residues. This concept was inherited from
pyteomics.mass.std_aa_comp.
-
maspy.constants.aaModComp= {'*': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:21': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:3': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:4': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:199': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:7': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:1020': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:35': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:188': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:28': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:34': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:1356': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:121': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:36': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'DSS': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:374': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:5': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:27': <MagicMock name='mock.mass.Composition()' id='140340419262840'>, 'u:1': <MagicMock name='mock.mass.Composition()' id='140340419262840'>}¶ A dictionary with elemental compositions of the peptide modifications. Modifications present at
www.unimod.orgshould be written as “u:X”, where X is the unimod accession number. If a modification is not present in unimod a text abbriviation should be used. This concept was inherited frompyteomics.mass.std_aa_comp.TODO: in the future this table should be imported from two external files. The first is directly obtained from www.unimod.org, the second contains user specified entries. It is also possible to specify a modification folder where multiple user specified files can be deposited for importing.
-
maspy.constants.aaModMass= {}¶ A dictionary with exact monoisotopic masses of peptide modifications.
-
maspy.constants.expasy_rules= {'clostripain': 'R', 'caspase 2': '(?<=DVA)D(?=[^PEDQKR])', 'cnbr': 'M', 'caspase 5': '(?<=[LW]EH)D', 'bnps-skatole': 'W', 'caspase 1': '(?<=[FWYL]\\w[HAT])D(?=[^PEDQKR])', 'enterokinase': '(?<=[DE]{3})K', 'asp-n': '\\w(?=D)', 'granzyme b': '(?<=IEP)D', 'caspase 7': '(?<=DEV)D(?=[^PEDQKR])', 'caspase 6': '(?<=VE[HI])D(?=[^PEDQKR])', 'factor xa': '(?<=[AFGILTVM][DE]G)R', 'pepsin ph1.3': '((?<=[^HKR][^P])[^R](?=[FLWY][^P]))|((?<=[^HKR][^P])[FLWY](?=\\w[^P]))', 'formic acid': 'D', 'thermolysin': '[^DE](?=[AFILMV])', 'proteinase k': '[AEFILTVWY]', 'pepsin ph2.0': '((?<=[^HKR][^P])[^R](?=[FL][^P]))|((?<=[^HKR][^P])[FL](?=\\w[^P]))', 'arg-c': 'R', 'ntcb': '\\w(?=C)', 'caspase 10': '(?<=IEA)D', 'caspase 4': '(?<=LEV)D(?=[^PEDQKR])', 'caspase 3': '(?<=DMQ)D(?=[^PEDQKR])', 'caspase 9': '(?<=LEH)D', 'hydroxylamine': 'N(?=G)', 'caspase 8': '(?<=[IL]ET)D(?=[^PEDQKR])', 'thrombin': '((?<=G)R(?=G))|((?<=[AFGILTVM][AFGILTVWA]P)R(?=[^DE][^DE]))', 'trypsin simple': '[KR]', 'trypsin': '([KR](?=[^P]))|((?<=W)K(?=P))|((?<=M)R(?=P))', 'chymotrypsin high specificity': '([FY](?=[^P]))|(W(?=[^MP]))', 'iodosobenzoic acid': 'W', 'chymotrypsin low specificity': '([FLY](?=[^P]))|(W(?=[^MP]))|(M(?=[^PY]))|(H(?=[^DMPW]))', 'lysc': 'K', 'staphylococcal peptidase i': '(?<=[^E])E', 'proline endopeptidase': '(?<=[HKR])P(?=[^P])', 'glutamyl endopeptidase': 'E'}¶ The dictionary expasy_rules contains regular expressions for cleavage rules of the most popular proteolytic enzymes. The rules were copied from Pyteomics and initially taken from the PeptideCutter tool at Expasy.
maspy.core module¶
The core module contains python classes to represent spectra, peptide spectrum matches and peptide LC-MS features, and containers which manage storage, data access, saving and loading of these data types.
-
class
maspy.core.Ci(identifier, specfile)[source]¶ Bases:
objectChromatogram item (Ci), representation of a mzML
chromatogram.Variables: - id – The unique id of this chromatogram. Typically descriptive for the
chromatogram, eg “TIC” (total ion current). Is used together with
self.specfileas a key to access the spectrum in its containerMsrunContainer.cic. - specfile – An id representing a group of spectra, typically of the same mzML file / ms-run.
- id –
- dataProcessingRef – This attribute can optionally reference the ‘id’ of the appropriate dataProcessing, from mzML.
- precursor – The method of precursor ion selection and activation, from mzML.
- product – The method of product ion selection and activation in a precursor ion scan, from mzML.
- params – A list of parameter tuple, #TODO: as described elsewhere
- arrays – a dictionary containing the binary data of a chromatogram as
numpy.array. Keys are derived from the specified mzML cvParam, seemaspy.xml.findBinaryDataType(). Typically contains at least a time parameterrt(retention time)Ci.arrays = {'rt': numpy.array(), ...} - arrayInfo –
dictionary describing each data type present in
.arrays.{dataType: {'dataProcessingRef': str, 'params': [paramTuple, paramTuple, ...] } }
code example:
{u'i': {u'dataProcessingRef': None, u'params': [('MS:1000521', '', None), ('MS:1000574', '', None), ('MS:1000515', '', 'MS:1000131') ] }, u'rt': {u'dataProcessingRef': None, u'params': [('MS:1000523', '', None), ('MS:1000574', '', None), ('MS:1000595', '', 'UO:0000031') ] } }
-
arrayInfo¶
-
arrays¶
-
attrib¶
-
dataProcessingRef¶
-
id¶
-
static
jsonHook(encoded)[source]¶ Custom JSON decoder that allows construction of a new
Ciinstance from a decoded JSON object.Parameters: encoded – a JSON decoded object literal (a dict) Returns: “encoded” or one of the these objects: Ci,MzmlProduct,MzmlPrecursor
-
params¶
-
precursor¶
-
product¶
-
specfile¶
- id – The unique id of this chromatogram. Typically descriptive for the
chromatogram, eg “TIC” (total ion current). Is used together with
-
class
maspy.core.Fi(identifier, specfile)[source]¶ Bases:
objectFeature item (Fi), representation of a peptide LC-MS feature.
Variables: - id – the unique identifier of a LC-MS feature, as generated by the software used for extracting features from MS1 spectra.
- specfile – An id representing an mzML file / ms-run filename.
- rt – a representative retention time value of the
Fi(in seconds). For example the retention time of the feature apex. - mz – a representative mass to charge value of the
Fi(in Dalton / charge). For example the average m/z value of all data points. - charge – the
Ficharge state - intensity – a meassure for the
Fiabundance, used for relative quantification. Typically the area of the feature intensity over time. - isValid – bool or None if not specified
this attribute can be used to flag if a
Fihas passed a given quality threshold. Can be used to filter valid elements in egFiContainer.getArrays(). - isMatched – bool or None if not specified
True if any
SiorSiielements could be matched. Should be set to False on import. - isAnnotated – bool or None if not specified
True if any
Siielements could be matched. Should be set to False on import. Not sure yet how to handle annotation from other features. - siIds – list of tuple(specfile, id) from matched
Si - siiIds – list of tuple(specfile, id) from matched
Sii - peptide – peptide sequence containing amino acid modifications. If
multiple peptide sequences are possible due to multiple
Siimatches the most likely must be chosen. A simple and accepted way to do this is by choosing theSiiidentification with the best score. - sequence – the plain amino acid sequence of
self.peptide - bestScore – the score of the acceppted
Siifor annotation
-
class
maspy.core.FiContainer[source]¶ Bases:
objectConainer for
Fielements.Variables: - container –
contains the stored
Fielements.{specfilename: {'Fi identifier': [Fi, ...], ...} - info –
a dictionary containing information about imported specfiles.
{specfilename: {'path': str}, ... }path: folder location used by the
FiContainerto save and load data to the hard disk.
-
addSpecfile(specfiles, path)[source]¶ Prepares the container for loading
ficfiles by adding specfile entries toself.info. UseFiContainer.load()afterwards to actually import the files.Parameters: - specfiles (str or [str, str, ...]) – the name of an ms-run file or a list of names
- path – filedirectory used for loading and saving
ficfiles
-
getArrays(attr=None, specfiles=None, sort=False, reverse=False, selector=None, defaultValue=None)[source]¶ Return a condensed array of data selected from
Fiinstances fromself.containerfor fast and convenient data processing.Parameters: - attr – list of
Fiitem attributes that should be added to the returned array. The attributes “id” and “specfile” are always included, in combination they serve as a unique id. - defaultValue – if an item is missing an attribute, the “defaultValue” is added to the array instead.
- specfiles (str or [str, str, ...]) – filenames of ms-run files - if specified return only items from those files
- sort – if “sort” is specified the returned list of items is sorted
according to the
Fiattribute specified by “sort”, if the attribute is not present the item is skipped. - reverse – bool, set True to reverse sort order
- selector – a function which is called with each Fi item and has
to return True (include item) or False (discard item).
Default function is:
lambda si: True. By default only items withFi.isValid == Trueare returned.
Returns: {‘attribute1’: numpy.array(), ‘attribute2’: numpy.array(), ... }
- attr – list of
-
getItem(specfile, identifier)[source]¶ Returns a
Fiinstance fromself.container.Parameters: - specfile – a ms-run file name
- identifier – item identifier
Fi.id
Returns: self.container[specfile][identifier]
-
getItems(specfiles=None, sort=False, reverse=False, selector=None)[source]¶ Generator that yields filtered and/or sorted
Fiinstances fromself.container.Parameters: - specfiles (str or [str, str, ...]) – filenames of ms-run files - if specified return only items from those files
- sort – if “sort” is specified the returned list of items is sorted
according to the
Fiattribute specified by “sort”, if the attribute is not present the item is skipped. - reverse – bool,
Truereverses the sort order - selector – a function which is called with each
Fiitem and has to return True (include item) or False (discard item). By default only items withFi.isValid == Trueare returned.
Returns: items from container that passed the selector function
-
load(specfiles=None)[source]¶ Imports the specified
ficfiles from the hard disk.Parameters: specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
-
removeAnnotation(specfiles=None)[source]¶ Remove all annotation information from
Fielements.Parameters: specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
-
removeSpecfile(specfiles)[source]¶ Completely removes the specified specfiles from the
FiContainer.Parameters: specfiles – the name of an ms-run file or a list of names.
-
save(specfiles=None, compress=True, path=None)[source]¶ Writes the specified specfiles to
ficfiles on the hard disk.Note
If
.save()is called and noficfiles are present in the specified path new files are generated, otherwise old files are replaced.Parameters: - specfiles – the name of an ms-run file or a list of names. If None all specfiles are selected.
- compress – bool, True to use zip file compression
- path – filedirectory to which the
ficfiles are written. By default the parameter is set toNoneand the filedirectory is read fromself.info[specfile]['path']
-
setPath(folderpath, specfiles=None)[source]¶ Changes the folderpath of the specified specfiles. The folderpath is used for saving and loading of
ficfiles.Parameters: - specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
- folderpath – a filedirectory
- container –
-
class
maspy.core.MsrunContainer[source]¶ Bases:
objectContainer for mass spectrometry data (eg MS1 and MS2 spectra), provides full support for mzML files, see mzML schema documentation.
Variables: - rmc – “run metadata container”, contains mzML metadata elements from
the mzML file as a
lxml.etree.Elementobject. This comprises allmzMLsubelements, except for the run` element subelementsspectrumListandchromatogramList. - cic – “chromatogram item container”, see
Ci - smic – “spectrum metadata item container”, see
Smi - saic – “spectrum array item container”, see
Sai - sic – “spectrum item container”, see
Si - info –
contains information about the imported specfiles.
{specfilename: {'path': str, 'status': {u'ci': bool, u'rm': bool, u'sai': bool, u'si': bool, u'smi': bool} }, ... }pathcontains information about the filelocation used for saving and loading msrun files in the maspy dataformat.statusdescribes which datatypes are currently imported.code example:
{u'JD_06232014_sample1_A': {u'path': u'C:/filedirectory', u'status': {u'ci': True, u'rm': True, u'sai': True, u'si': True, u'smi': True } } }
Note
The structure of the containers
rmc,cic,smic,saicandsicis always:{"specfilename": {"itemId": item, ...}, ...}-
addSpecfile(specfiles, path)[source]¶ Prepares the container for loading
mrcfiles by adding specfile entries toself.info. UseMsrunContainer.load()afterwards to actually import the filesParameters: - specfiles (str or [str, str, ...]) – the name of an ms-run file or a list of names
- path – filedirectory used for loading and saving
mrcfiles
-
getArrays(attr=None, specfiles=None, sort=False, reverse=False, selector=None, defaultValue=None)[source]¶ Return a condensed array of data selected from
Siinstances fromself.sicfor fast and convenient data processing.Parameters: - attr – list of
Siitem attributes that should be added to the returned array. The attributes “id” and “specfile” are always included, in combination they serve as a unique id. - defaultValue – if an item is missing an attribute, the “defaultValue” is added to the array instead.
- specfiles (str or [str, str, ...]) – filenames of ms-run files, if specified return only items from those files
- sort – if “sort” is specified the returned list of items is sorted
according to the
Siattribute specified by “sort”, if the attribute is not present the item is skipped. - reverse – bool, set True to reverse sort order
- selector – a function which is called with each
Siitem and has to return True (include item) or False (discard item). Default function is:lambda si: True
Returns: {‘attribute1’: numpy.array(), ‘attribute2’: numpy.array(), ... }
- attr – list of
-
getItem(specfile, identifier)[source]¶ Returns a
Siinstance fromself.sic.Parameters: - specfile – a ms-run file name
- identifier – item identifier
Si.id
Returns: self.sic[specfile][identifier]
-
getItems(specfiles=None, sort=False, reverse=False, selector=None)[source]¶ Generator that yields filtered and/or sorted
Siinstances fromself.sic.Parameters: - specfiles (str or [str, str, ...]) – filenames of ms-run files - if specified return only items from those files
- sort – if “sort” is specified the returned list of items is sorted
according to the
Siattribute specified by “sort”, if the attribute is not present the item is skipped. - reverse – bool,
Truereverses the sort order - selector – a function which is called with each
Siitem and returns True (include item) or False (discard item). Default function is:lambda si: True
Returns: items from container that passed the selector function
-
load(specfiles=None, rm=False, ci=False, smi=False, sai=False, si=False)[source]¶ Import the specified datatypes from
mrcfiles on the hard disk.Parameters: - specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
- rm – bool, True to import
mrc_rm(run metadata) - ci – bool, True to import
mrc_ci(chromatogram items) - smi – bool, True to import
mrc_smi(spectrum metadata items) - sai – bool, True to import
mrc_sai(spectrum array items) - si – bool, True to import
mrc_si(spectrum items)
-
removeData(specfiles=None, rm=False, ci=False, smi=False, sai=False, si=False)[source]¶ Removes the specified datatypes of the specfiles from the msrunContainer. To completely remove a specfile use
MsrunContainer.removeSpecfile(), which also removes the complete entry fromself.info.Parameters: - specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
- rm – bool, True to select
self.rmc - ci – bool, True to select
self.cic - smi – bool, True to select
self.smic - sai – bool, True to select
self.saic - si – bool, True to select
self.sic
-
removeSpecfile(specfiles)[source]¶ Completely removes the specified specfiles from the
msrunContainer.Parameters: specfiles (str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
-
save(specfiles=None, rm=False, ci=False, smi=False, sai=False, si=False, compress=True, path=None)[source]¶ Writes the specified datatypes to
mrcfiles on the hard disk.Note
If
.save()is called and nomrcfiles are present in the specified path new files are generated, otherwise old files are replaced.Parameters: - specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
- rm – bool, True to select
self.rmc(run metadata) - ci – bool, True to select
self.cic(chromatogram items) - smi – bool, True to select
self.smic(spectrum metadata items) - sai – bool, True to select
self.saic(spectrum array items) - si – bool, True to select
self.sic(spectrum items) - compress – bool, True to use zip file compression
- path – filedirectory to which the
mrcfiles are written. By default the parameter is set toNoneand the filedirectory is read fromself.info[specfile]['path']
-
setPath(folderpath, specfiles=None)[source]¶ Changes the folderpath of the specified specfiles. The folderpath is used for saving and loading of
mrcfiles.Parameters: - specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
- folderpath – a filedirectory
- rmc – “run metadata container”, contains mzML metadata elements from
the mzML file as a
-
class
maspy.core.MzmlPrecursor(spectrumRef=None, activation=None, isolationWindow=None, selectedIonList=None, **kwargs)[source]¶ Bases:
objectMasPy representation of an mzML
Scanelement, see mzML schema documentation.Variables: - spectrumRef – native id of the spectrum corresponding to the precursor spectrum
- activation – the mzML
activationis represented as a tuple of param tuples. It is describing the type and energy level used for activation and should not be changed. - isolationWindow – the mzML
isolationWindowis represented as a tuple of parm tuples. It is describing the measurement and should not be changed. - selectedIonList – a list of mzML
selectedIonelements, which are represented as a tuple of param tuples.
Note
The attributes “sourceFileRef” and “externalSpectrumID” are not supported by MasPy on purpose, since they are only used to refere to scans which are external to the mzML file.
-
activation¶
-
isolationWindow¶
-
selectedIonList¶
-
spectrumRef¶
-
class
maspy.core.MzmlProduct(isolationWindow=None, **kwargs)[source]¶ Bases:
objectMasPy representation of an mzML
Productelement, the mzML schema documentation does however not provide a lot of information how this element is intended to be used and which information can be present.Variables: isolationWindow – the mzML isolationWindowis represented as a tuple of parm tuples. It is describing the measurement and should not be changed.-
isolationWindow¶
-
-
class
maspy.core.MzmlScan(scanWindowList=None, params=None, **kwargs)[source]¶ Bases:
objectMasPy representation of an mzML
Scanelement, see mzML schema documentation.Variables: - scanWindowList – a list of mzML
scanWindowelements, which are represented as a tuple of parm tuples. The mzMLscanWindowListis describing the measurement and should not be changed. - params – a list of parameter tuple (cvParam tuple, userParam tuple or
referencableParamGroup tuple) of an mzML
Scanelement.
Note
The attributes “sourceFileRef” and “externalSpectrumID” are not supported by MasPy on purpose, since they are only used to refere to scans which are external to the mzML file. The attribute “spectrumRef” could be included but seems kind of useless.
The attribute “instrumentConfigurationRef” should be included though: #TODO.
-
params¶
-
scanWindowList¶
- scanWindowList – a list of mzML
-
class
maspy.core.Sai(identifier, specfile)[source]¶ Bases:
objectSpectrum array item (Sai), representation of the binary data arrays of an mzML
spectrum.Variables: - id – The unique id of this spectrum, typically the scan number. Is used
together with
self.specfileas a key to access the spectrum in its containerMsrunContainer.saic. Should be derived from the spectrums nativeID format (MS:1000767). - specfile – An id representing a group of spectra, typically of the same mzML file / ms-run.
- arrays – a dictionary containing the binary data of the recorded ion
spectrum as
numpy.array. Keys are derived from the specified mzML cvParam, seemaspy.xml.findBinaryDataType(). Must at least contain the keysmz(mass to charge ratio) andi(intensity).Sai.arrays = {'mz': numpy.array(), 'i': numpy.array(), ...} - arrayInfo –
dictionary describing each data type present in
.arrays.{dataType: {'dataProcessingRef': str, 'params': [paramTuple, paramTuple, ...] } }
code example:
{u'i': {u'dataProcessingRef': None, u'params': [('MS:1000521', '', None), ('MS:1000574', '', None), ('MS:1000515', '', 'MS:1000131')]}, u'mz': {u'dataProcessingRef': None, u'params': [('MS:1000523', '', None), ('MS:1000574', '', None), ('MS:1000514', '', 'MS:1000040')]}}
-
arrayInfo¶
-
arrays¶
-
id¶
-
static
jsonHook(encoded)[source]¶ Custom JSON decoder that allows construction of a new
Saiinstance from a decoded JSON object.Parameters: encoded – a JSON decoded object literal (a dict) Returns: “encoded” or Sai
-
specfile¶
- id – The unique id of this spectrum, typically the scan number. Is used
together with
-
class
maspy.core.Si(identifier, specfile)[source]¶ Bases:
objectSpectrum item (Si) - this is the spectrum representation intended to be used in maspy. A simplified representation of spectrum metadata. Contains only specifically imported attributes, which are necessary for data analysis. Does not follow any PSI data structure or name space rules.
Additional attributes can be transferred from the corresponding
Smientry. This is done by default when importing an mzML file by using the functionmaspy.reader.defaultFetchSiAttrFromSmi().Variables: - id – The unique id of this spectrum, typically the scan number. Is used
together with
self.specfileas a key to access the spectrum in its containerMsrunContainer.sic. Should be derived from the spectrums nativeID format (MS:1000767). - specfile – An id representing a group of spectra, typically of the same mzML file / ms-run.
- isValid – bool, can be used for filtering.
- msLevel – stage of ms level in a multi stage mass spectrometry experiment.
- id – The unique id of this spectrum, typically the scan number. Is used
together with
-
class
maspy.core.Sii(identifier, specfile)[source]¶ Bases:
objectSpectrum identification item (Sii) - representation of an MSn fragment spectrum annotation, also referred to as peptide spectrum match (PSM).
Variables: - id – The unique id of this spectrum, typically the scan number. Is used
together with
self.specfileas a key to access the spectrum in its containerSiiContaineror the corresponding spectrum in aMsrunContainer. - specfile – An id representing an mzML file / ms-run filename.
- rank – The rank of this
Siicompared to others for the same MSn spectrum. The rank is based on a score defined in theSiiContainer. If multiple Sii have the same top score, they should all be assignedself.rank = 1. - isValid – bool or None if not specified
this attribute can be used to flag if a Sii has passed a given quality
threshold or has been validated as correct. Is used to filter valid
elements in eg
SiiContainer.getArrays().
- id – The unique id of this spectrum, typically the scan number. Is used
together with
-
class
maspy.core.SiiContainer[source]¶ Bases:
objectConainer for
Siielements.Variables: - container –
contains the stored
Siielements.{specfilename: {'Sii identifier': [Sii, ...], ...} - info –
a dictionary containing information about imported specfiles.
{specfilename: {'path': str, 'qcAttr': str, 'qcLargerBetter': bool, 'qcCutoff': float, 'rankAttr': str, 'rankLargerBetter': bool }, ... }path: folder location used by the
SiiContainerto save and load data to the hard disk.qcAttr: name of the parameter to define a quality cutoff. Typically this is some sort of a global false positive estimator (eg FDR)
qcLargerBetter: bool, True if a large value for the
.qcAttrmeans a higher confidence.qcCutoff: float, the quality threshold for the specifed
.qcAttrrankAttr: name of the parameter used for ranking
Siiaccording to how well they match to a fragment ion spectrum, in the case when their are multipleSiipresent for the same spectrum.rankLargerBetter: bool, True if a large value for the
.rankAttrmeans a better match to the fragment ion spectrum
Note
In the future this container may be integrated in an evidence or an mzIdentML like container.
-
addSiInfo(msrunContainer, specfiles=None, attributes=['obsMz', 'rt', 'charge'])[source]¶ Transfer attributes to
Siielements from the corresponding :class`Si` inMsrunContainer.sic. If an attribute is not present in theSithe attribute value in theSii``is set to ``None.Attribute examples: ‘obsMz’, ‘rt’, ‘charge’, ‘tic’, ‘iit’, ‘ms1Id’
Parameters: - msrunContainer – an instance of
MsrunContainerwhich has imported the corresponding specfiles - specfiles – the name of an ms-run file or a list of names. If None all specfiles are selected.
- attributes – a list of
Siattributes that should be transfered.
- msrunContainer – an instance of
-
addSpecfile(specfiles, path)[source]¶ Prepares the container for loading
siicfiles by adding specfile entries toself.info. UseSiiContainer.load()afterwards to actually import the files.Parameters: - specfiles (str or [str, str, ...]) – the name of an ms-run file or a list of names
- path – filedirectory used for loading and saving
siicfiles
-
calcMz(specfiles=None, guessCharge=True, obsMzKey='obsMz')[source]¶ Calculate the exact mass for
Siielements from theSii.peptidesequence.Parameters: - specfiles – the name of an ms-run file or a list of names. If None all specfiles are selected.
- guessCharge – bool, True if the charge should be guessed if the
attribute
chargeis missing fromSii. Uses the calculated peptide mass and the observed m/z value to calculate the charge. - obsMzKey – attribute name of the observed m/z value in
Sii.
-
getArrays(attr=None, specfiles=None, sort=False, reverse=False, selector=None, defaultValue=None)[source]¶ Return a condensed array of data selected from
Siiinstances fromself.containerfor fast and convenient data processing.Parameters: - attr – list of
Siiitem attributes that should be added to the returned array. The attributes “id” and “specfile” are always included, in combination they serve as a unique id. - defaultValue – if an item is missing an attribute, the “defaultValue” is added to the array instead.
- specfiles (str or [str, str, ...]) – filenames of ms-run files - if specified return only items from those files
- sort – if “sort” is specified the returned list of items is sorted
according to the
Siiattribute specified by “sort”, if the attribute is not present the item is skipped. - reverse – bool, set True to reverse sort order
- selector – a function which is called with each Sii item and has
to return True (include item) or False (discard item).
Default function is:
lambda si: True. By default only items withSii.isValid == Trueare returned.
Returns: {‘attribute1’: numpy.array(), ‘attribute2’: numpy.array(), ... }
- attr – list of
-
getItems(specfiles=None, sort=False, reverse=False, selector=None)[source]¶ Generator that yields filtered and/or sorted
Siiinstances fromself.container.Parameters: - specfiles (str or [str, str, ...]) – filenames of ms-run files - if specified return only items from those files
- sort – if “sort” is specified the returned list of items is sorted
according to the
Siiattribute specified by “sort”, if the attribute is not present the item is skipped. - reverse – bool,
Truereverses the sort order - selector – a function which is called with each
Siiitem and has to return True (include item) or False (discard item). By default only items withSii.isValid == Trueare returned.
Returns: items from container that passed the selector function
-
getValidItem(specfile, identifier)[source]¶ Returns a
Siiinstance fromself.containerif it is valid, if all elements ofself.container[specfile][identifier] are ``Sii.isValid == FalsethenNoneis returned.Parameters: - specfile – a ms-run file name
- identifier – item identifier
Sii.id
Returns: SiiorNone
-
load(specfiles=None)[source]¶ Imports
siicfiles from the hard disk.Parameters: specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
-
removeSpecfile(specfiles)[source]¶ Completely removes the specified specfiles from the
SiiContainer.Parameters: specfiles – the name of an ms-run file or a list of names.
-
save(specfiles=None, compress=True, path=None)[source]¶ Writes the specified specfiles to
siicfiles on the hard disk.Note
If
.save()is called and nosiicfiles are present in the specified path new files are generated, otherwise old files are replaced.Parameters: - specfiles – the name of an ms-run file or a list of names. If None all specfiles are selected.
- compress – bool, True to use zip file compression
- path – filedirectory to which the
siicfiles are written. By default the parameter is set toNoneand the filedirectory is read fromself.info[specfile]['path']
-
setPath(folderpath, specfiles=None)[source]¶ Changes the folderpath of the specified specfiles. The folderpath is used for saving and loading of
siicfiles.Parameters: - folderpath – a filedirectory
- specfiles (None, str, [str, str]) – the name of an ms-run file or a list of names. If None all specfiles are selected.
- container –
-
class
maspy.core.Smi(identifier, specfile)[source]¶ Bases:
objectSpectrum metadata item (Smi), representation of all the metadata data of an mzML
spectrum, excluding the actual binary data.For details on the mzML
spectrumelement refer to the documentation,.Variables: - id – The unique id of this spectrum, typically the scan number. Is used
together with
self.specfileas a key to access the spectrum in its containerMsrunContainer.smic. Should be derived from the spectrums nativeID format (MS:1000767). - specfile – An id representing a group of spectra, typically of the same mzML file / ms-run.
- attributes – dict, attributes of an mzML
spectrumelement - params – a list of parameter tuple (cvParam tuple, userParam tuple or
referencableParamGroup tuple) of an mzML
spectrumelement. - scanListParams – a list of parameter tuple (cvParam tuple, userParam
tuple or referencableParamGroup tuple) of an mzML
scanListelement. - scanList – a list of
MzmlScanelements, derived from elements of an an mzMLscanListelement. - precursorList – a list of
MzmlPrecursorelements, derived from elements of an an mzMLprecursorListelement. - productList – a list of
MzmlProductelements, derived from elements of an an mzMLproductListelement.
Warning
The
Smiis used to generatespectrumxml elements by using the functionmaspy.writer.xmlSpectrumFromSmi(). In order to generate a valid mzML element all attributes ofSmihave to be in the correct format. Therefore it is highly recommended to only use properly implemented and tested methods for making changes to anySmiattribute.-
attributes¶
-
id¶
-
static
jsonHook(encoded)[source]¶ Custom JSON decoder that allows construction of a new
Smiinstance from a decoded JSON object.Parameters: encoded – a JSON decoded object literal (a dict) Returns: “encoded” or one of the these objects: Smi,MzmlScan,MzmlProduct,MzmlPrecursor
-
params¶
-
precursorList¶
-
productList¶
-
scanList¶
-
scanListParams¶
-
specfile¶
- id – The unique id of this spectrum, typically the scan number. Is used
together with
-
maspy.core.addMsrunContainers(mainContainer, subContainer)[source]¶ Adds the complete content of all specfile entries from the subContainer to the mainContainer. However if a specfile of
subContainer.infois already present inmainContainer.infoits contents are not added to the mainContainer.Parameters: - mainContainer –
MsrunContainer - subContainer –
MsrunContainer
Warning
does not generate new items, all items added to the
mainContainerare still present in thesubContainerand changes made to elements of one container also affects the elements of the other one (ie elements share same memory location).- mainContainer –
maspy.errors module¶
#TODO: module description
maspy.featuremethods module¶
#TODO: module description
-
maspy.featuremethods.matchToFeatures(fiContainer, specContainer, specfiles=None, fMassKey='mz', sMassKey='obsMz', isotopeErrorList=0, precursorTolerance=5, toleranceUnit='ppm', rtExpansionUp=0.1, rtExpansionDown=0.05, matchCharge=True, scoreKey='pep', largerBetter=False)[source]¶ Annotate
Fi(Feature items) by matchingSi(Spectrum items) orSii(Spectrum identification items).Parameters: - fiContainer –
maspy.core.FeatureContainer, containsFi. - specContainer –
maspy.core.MsrunContainerormaspy.core.SiiContainer, containsSiorSii. - specfiles (str, list or None) – filenames of ms-run files, if specified consider only items from those files
- fMassKey – mass attribute key in
Fi.__dict__ - sMassKey – mass attribute key in
Si.__dict__orSii.__dict__(eg ‘obsMz’, ‘excMz’) - isotopeErrorList (list or tuple of int) – allowed isotope errors relative to the spectrum mass, for example “0” or “1”. If no feature has been matched with isotope error 0, the spectrum mass is increased by the mass difference of carbon isotopes 12 and 13 and matched again. The different isotope error values are tested in the specified order therefore “0” should normally be the first value of the list.
- precursorTolerance – the largest allowed mass deviation of
SiorSiirelative toFi - toleranceUnit – defines how the
precursorToleranceis applied to the mass value ofFi."ppm": mass * (1 +/- tolerance*1E-6)or"da": mass +/- value - rtExpansionUp – relative upper expansion of
Firetention time area.limitHigh = Fi.rtHigh + (Fi.rtHigh - Fi.rtLow) * rtExpansionUp - rtExpansionDown – relative lower expansion of
Firetention time area.limitLow = Fi.rtLow - (Fi.rtHigh - Fi.rtLow) * rtExpansionDown - matchCharge – bool, True if
FiandSiorSiimust have the samechargestate to be matched. - scoreKey –
Siiattribute name used for scoring the identification reliability - largerBetter – bool, True if higher score value means a better identification reliability
#TODO: this function is nested pretty badly and should maybe be rewritten #TODO: replace tolerance unit “ppm” by tolerance mode “relative” and change
repsective calculations- fiContainer –
-
maspy.featuremethods.rtCalibration(fiContainer, allowedRtDev=60, allowedMzDev=2.5, reference=None, specfiles=None, showPlots=False, plotDir=None, minIntensity=100000.0)[source]¶ Performs a retention time calibration between
FeatureItemof multiple specfiles.Variables: - fiContainer – Perform alignment on
FeatureIteminFeatureContainer.specfiles - allowedRtDev – maxium retention time difference of two features in two runs to be matched
- allowedMzDev – maxium relative m/z difference (in ppm) of two features in two runs to be matched
- showPlots – boolean, True if a plot should be generated which shows to results of the calibration
- plotDir – if not None and showPlots is True, the plots are saved to this location.
- reference – Can be used to specifically specify a reference specfile
- specfiles – Limit alignment to those specfiles in the fiContainer
- minIntensity – consider only features with an intensity above this value
- fiContainer – Perform alignment on
maspy.mit_stats module¶
This module contains functions to calculate a running mean, median and mode.
-
maspy.mit_stats.runningMean(seq, N, M)[source]¶ - Purpose: Find the mean for the points in a sliding window (fixed size)
- as it is moved from left to right by one point at a time.
- Inputs:
- seq – list containing items for which a mean (in a sliding window) is
- to be calculated (N items)
N – length of sequence M – number of items in sliding window
- Otputs:
- means – list of means with size N - M + 1
-
maspy.mit_stats.runningMedian(seq, M)[source]¶ - Purpose: Find the median for the points in a sliding window (odd number in size)
- as it is moved from left to right by one point at a time.
- Inputs:
- seq – list containing items for which a running median (in a sliding window)
- is to be calculated
M – number of items in window (window size) – must be an integer > 1
- Otputs:
- medians – list of medians with size N - M + 1
- Note:
- The median of a finite list of numbers is the “center” value when this list is sorted in ascending order.
- If M is an even number the two elements in the window that are close to the center are averaged to give the median (this is not by definition)
maspy.peptidemethods module¶
- provides functions to work with peptide
- sequences, mass to charge ratios and modifications and calvulation of masses.
-
maspy.peptidemethods.calcMassFromMz(mz, charge)[source]¶ Calculate the mass of a peptide from its mz and charge.
Parameters: - mz – float, mass to charge ratio (Dalton / charge)
- charge – int, charge state
Returns: non protonated mass (charge = 0)
-
maspy.peptidemethods.calcMhFromMz(mz, charge)[source]¶ Calculate the MH+ value from mz and charge.
Parameters: - mz – float, mass to charge ratio (Dalton / charge)
- charge – int, charge state
Returns: mass to charge ratio of the mono protonated ion (charge = 1)
-
maspy.peptidemethods.calcMzFromMass(mass, charge)[source]¶ Calculate the mz value of a peptide from its mass and charge.
Parameters: - mass – float, exact non protonated mass
- charge – int, charge state
Returns: mass to charge ratio of the specified charge state
-
maspy.peptidemethods.calcMzFromMh(mh, charge)[source]¶ Calculate the mz value from MH+ and charge.
Parameters: - mh – float, mass to charge ratio (Dalton / charge) of the mono protonated ion
- charge – int, charge state
Returns: mass to charge ratio of the specified charge state
-
maspy.peptidemethods.calcPeptideMass(peptide, **kwargs)[source]¶ Calculate the mass of a peptide.
Parameters: - aaMass – A dictionary with the monoisotopic masses of amino acid
residues, by default
maspy.constants.aaMass - aaModMass – A dictionary with the monoisotopic mass changes of
modications, by default
maspy.constants.aaModMass - elementMass – A dictionary with the masses of chemical elements, by
default
pyteomics.mass.nist_mass - peptide – peptide sequence, modifications have to be written in the
format “[modificationId]” and “modificationId” has to be present in
maspy.constants.aaModMass
#TODO: change to a more efficient way of calculating the modified mass, by first extracting all present modifications and then looking up their masses.
- aaMass – A dictionary with the monoisotopic masses of amino acid
residues, by default
-
maspy.peptidemethods.digestInSilico(proteinSequence, cleavageRule='[KR]', missedCleavage=0, removeNtermM=True, minLength=5, maxLength=55)[source]¶ Returns a list of peptide sequences and cleavage information derived from an in silico digestion of a polypeptide.
Parameters: - proteinSequence – amino acid sequence of the poly peptide to be digested
- cleavageRule – cleavage rule expressed in a regular expression, see
maspy.constants.expasy_rules - missedCleavage – number of allowed missed cleavage sites
- removeNtermM – booo, True to consider also peptides with the N-terminal methionine of the protein removed
- minLength – int, only yield peptides with length >= minLength
- maxLength – int, only yield peptides with length <= maxLength
Returns: a list of resulting peptide enries. Protein positions start with
1and end withlen(proteinSequence.[(peptide amino acid sequence, {'startPos': int, 'endPos': int, 'missedCleavage': int} ), ... ]Note
This is a regex example for specifying N-terminal cleavage at lysine sites
\w(?=[K])
-
maspy.peptidemethods.removeModifications(peptide)[source]¶ Removes all modifications from a peptide string and return the plain amino acid sequence.
Parameters: - peptide – peptide sequence, modifications have to be written in the format “[modificationName]”
- peptide – str
Returns: amino acid sequence of
peptidewithout any modifications
-
maspy.peptidemethods.returnModPositions(peptide, indexStart=1, removeModString='UNIMOD:')[source]¶ Determines the amino acid positions of all present modifications.
Parameters: - peptide – peptide sequence, modifications have to be written in the format “[modificationName]”
- indexStart – returned amino acids positions of the peptide start with this number (first amino acid position = indexStart)
- removeModString – string to remove from the returned modification name
Returns: {modificationName:[position1, position2, ...], ...}
#TODO: adapt removeModString to the new unimod ids in #maspy.constants.aaModComp (“UNIMOD:X” -> “u:X”) -> also change unit tests.
maspy.proteindb module¶
The protein database module allows the import of protein sequences from fasta files, parsing of fasta entry headers and performing in silico digestion by specified cleavage rules to generate peptides.
-
class
maspy.proteindb.PeptideSequence(sequence, mc=None)[source]¶ Bases:
objectDescribes a peptide as derived by digestion of one or multiple proteins, can’t contain any modified amino acids.
Parameters: - sequence – amino acid sequence of the peptide
- missedCleavage – number of missed cleavages, dependens on enzyme specificity
- proteins – protein ids that generate this peptide under certain digest condition
- proteinPositions – start position and end position of a peptide in a
protein sequence. One based index, ie the first protein position is “1”.
{proteinId:(startPosition, endPositions) ...}
-
isUnique¶
-
static
jsonHook(encoded)[source]¶ Custom JSON decoder that allows construction of a new
PeptideSequenceinstance from a decoded JSON object.Parameters: encoded – a JSON decoded object literal (a dict) Returns: “encoded” or PeptideSequence
-
missedCleavage¶
-
proteinPositions¶
-
proteins¶
-
sequence¶
-
class
maspy.proteindb.ProteinDatabase[source]¶ Bases:
objectDescribes proteins and peptides generated by an in silico digestion of proteins.
Variables: - peptides – {sequence:PeptideSequence(), ...} contains elements of
PeptideSequencederived by an in silico digest of the proteins - proteins – {proteinId:Protein(), proteinId:Protein()}, used to access
ProteinSequenceelements by their id - proteinNames – {proteinName:Protein(), proteinName:Protein()},
alternative way to access
ProteinSequenceelements by their names. Must be populated manually - info –
a dictionary containing information about the protein database and parameters specified for the in silico digestion of the protein entries.
{'name': str, 'mc': str, 'cleavageRule': str, 'minLength': int 'maxLength': int, 'ignoreIsoleucine': bool, 'removeNtermM': bool }- name: a descriptive name of the protein database, used as the file
- name when saving the protein database to the hard disk
mc: number of allowed missed cleavage sites cleavageRule: cleavage rule expressed in a regular expression minLength: minimal peptide length maxLength: maximal peptide length ignoreIsoleucine: if True Isoleucine and Leucinge in peptide
sequences are treated as indistinguishable.- removeNtermM: if True also peptides with the N-terminal Methionine
- of the protein removed are considered.
-
calculateCoverage()[source]¶ Calcualte the sequence coverage masks for all protein entries.
For a detailed description see
_calculateCoverageMasks()
-
classmethod
load(path, name)[source]¶ Imports the specified
proteindbfile from the hard disk.Parameters: - path – filedirectory of the
proteindbfile - name – filename without the file extension ”.proteindb”
Note
this generates rather large files, which actually take longer to import than to newly generate. Maybe saving / loading should be limited to the protein database whitout in silico digestion information.
- path – filedirectory of the
-
save(path, compress=True)[source]¶ Writes the
.proteinsand.peptidesentries to the hard disk as aproteindbfile.Note
If
.save()is called and noproteindbfile is present in the specified path a new files is generated, otherwise the old file is replaced.Parameters: - path – filedirectory to which the
proteindbfile is written. The output file name is specified byself.info['name'] - compress – bool, True to use zip file compression
- path – filedirectory to which the
- peptides – {sequence:PeptideSequence(), ...} contains elements of
-
class
maspy.proteindb.ProteinSequence(identifier, sequence, name='')[source]¶ Bases:
objectDescribes a protein.
Variables: - id – identifier of the protein, for example a uniprot id.
- name – name of the protein
- sequence – amino acid sequence of the protein
- fastaHeader – str(), the proteins faster header line
- fastaInfo – dict(), the interpreted fasta header as generated when
using a faster header parsing function, see
fastaParseSgd(). - isUnique – bool, True if at least one unique peptide can be assigned to the protein
- uniquePeptides – a set of peptides which can be unambiguously assigned to this protein
- sharedPeptides – a set of peptides which are shared between different proteins
- coverageUnique – the number of amino acids in the protein sequence that are coverd by unique peptides
- coverageShared – the number of amino acids in the protein sequence that are coverd by unique or shared peptides
-
static
jsonHook(encoded)[source]¶ Custom JSON decoder that allows construction of a new
ProteinSequenceinstance from a decoded JSON object.Parameters: encoded – a JSON decoded object literal (a dict) Returns: “encoded” or ProteinSequence
-
maspy.proteindb.fastaParseSgd(header)[source]¶ Custom parser for fasta headers in the SGD format, see www.yeastgenome.org.
Parameters: header – str, protein entry header from a fasta file Returns: dict, parsed header
-
maspy.proteindb.importProteinDatabase(filePath, proteindb=None, decoyTag='[decoy]', contaminationTag='[cont]', headerParser=None, forceId=False, cleavageRule='[KR]', minLength=5, maxLength=40, missedCleavage=2, ignoreIsoleucine=False, removeNtermM=True)[source]¶ Generates a
ProteinDatabaseby in silico digestion of proteins from a fasta file.Parameters: - filePath – File path
- proteindb – optional an existing
ProteinDatabasecan be specified, otherwise a new instance is generated and returned - decoyTag – If a fasta file contains decoy protein entries, they should be specified with a sequence tag
- contaminationTag – If a fasta file contains contamination protein entries, they should be specified with a sequence tag
- headerParser – optional a headerParser can be specified #TODO: describe how a parser looks like
- forceId – bool, if True and no id can be extracted from the fasta header the whole header sequence is used as a protein id instead of raising an exception.
- cleavageRule – cleavage rule expressed in a regular expression, see
maspy.constants.expasy_rules - missedCleavage – number of allowed missed cleavage sites
- removeNtermM – bool, True to consider also peptides with the N-terminal Methionine of the protein removed
- minLength – int, only yield peptides with length >= minLength
- maxLength – int, only yield peptides with length <= maxLength
- ignoreIsoleucine – bool, if True treat Isoleucine and Leucine in peptide sequences as indistinguishable
maspy.reader module¶
This module provides functions to import various data types as maspy objects, which are associated with analysis workflows of mass spectrometry data. This currently comprises the mzML format, results of the percolator software and to some extent mzIdentML files, and file formats representing peptide LC-MS feature ”.featureXML” and ”.features.tsv”.
-
maspy.reader.addSiiToContainer(siiContainer, specfile, siiList)[source]¶ Adds the
Siielements contained in the siiList to the appropriate list insiiContainer.container[specfile].Parameters: - siiContainer – instance of
maspy.core.SiiContainer - specfile – unambiguous identifier of a ms-run file. Is also used as a reference to other MasPy file containers.
- siiList – a list of
Siielements imported from any PSM search engine results
- siiContainer – instance of
-
maspy.reader.applySiiQcValidation(siiContainer, specfile)[source]¶ Iterates over all Sii entries of a specfile in siiContainer and validates if they surpass a user defined quality threshold. The parameters for validation are defined in
siiContainer.info[specfile]:qcAttr,qcCutoffandqcLargerBetter
In addition to passing this validation a
Siihas also to be at the first list position in thesiiContainer.container. If both criteria are met the attributeSii.isValidis set toTrue.Parameters: - siiContainer – instance of
maspy.core.SiiContainer - specfile – unambiguous identifier of a ms-run file. Is also used as a reference to other MasPy file containers.
-
maspy.reader.applySiiRanking(siiContainer, specfile)[source]¶ Iterates over all Sii entries of a specfile in siiContainer and sorts Sii elements of the same spectrum according to the score attribute specified in
siiContainer.info[specfile]['rankAttr']. Sorted Sii elements are then ranked according to their sorted position, if multiple Sii have the same score, all get the same rank and the next entries rank is its list position.Parameters: - siiContainer – instance of
maspy.core.SiiContainer - specfile – unambiguous identifier of a ms-run file. Is also used as a reference to other MasPy file containers.
- siiContainer – instance of
-
maspy.reader.convertMzml(mzmlPath, outputDirectory=None)[source]¶ Imports an mzml file and converts it to a MsrunContainer file
Parameters: - mzmlPath – path of the mzml file
- outputDirectory – directory where the MsrunContainer file should be written
if it is not specified, the output directory is set to the mzml files directory.
-
maspy.reader.defaultFetchSiAttrFromSmi(smi, si)[source]¶ Default method to extract attributes from a spectrum metadata item (sai) and adding them to a spectrum item (si).
-
maspy.reader.importMsgfMzidResults(siiContainer, filelocation, specfile=None, qcAttr='eValue', qcLargerBetter=False, qcCutoff=0.01, rankAttr='score', rankLargerBetter=True)[source]¶ Import peptide spectrum matches (PSMs) from a MS-GF+ mzIdentML file, generate
Siielements and store them in the specifiedsiiContainer. ImportedSiiare ranked according to a specified attribute and validated if they surpass a specified quality threshold.Parameters: - siiContainer – imported PSM results are added to this instance of
siiContainer - filelocation – file path of the percolator result file
- specfile – optional, unambiguous identifier of a ms-run file. Is also
used as a reference to other MasPy file containers. If specified the
attribute
.specfileof allSiiis set to this value, else it is read from the mzIdentML file. - qcAttr – name of the parameter to define a quality cut off. Typically this is some sort of a global false positive estimator (eg FDR)
- qcLargerBetter – bool, True if a large value for the
.qcAttrmeans a higher confidence. - qcCutOff – float, the quality threshold for the specifed
.qcAttr - rankAttr – name of the parameter used for ranking
Siiaccording to how well they match to a fragment ion spectrum, in the case when their are multipleSiipresent for the same spectrum. - rankLargerBetter – bool, True if a large value for the
.rankAttrmeans a better match to the fragment ion spectrum
For details on
Siiranking seeapplySiiRanking()For details on
Siiquality validation seeapplySiiQcValidation()- siiContainer – imported PSM results are added to this instance of
-
maspy.reader.importMzml(filepath, msrunContainer=None, siAttrFromSmi=None, specfilename=None)[source]¶ Performs a complete import of a mzml file into a maspy MsrunContainer.
ParamsiAttrFromSmi: allow here to specify a custom function that extracts params a from spectrumMetadataItem Parameters: specfilename – by default the filename will be used as the specfilename in the MsrunContainer and all mzML item instances, specify here an alternative specfilename to override the default one
-
maspy.reader.importPeptideFeatures(fiContainer, filelocation, specfile)[source]¶ Import peptide features from a featureXml file, as generated for example by the OpenMS node featureFinderCentroided, or a features.tsv file by the Dinosaur command line tool.
Parameters: - fiContainer – imported features are added to this instance of
FeatureContainer. - filelocation – Actual file path
- specfile – Keyword (filename) to represent file in the
FeatureContainer. Each filename can only occure once, therefore importing the same filename again is prevented.
- fiContainer – imported features are added to this instance of
-
maspy.reader.importPercolatorResults(siiContainer, filelocation, specfile, psmEngine, qcAttr='qValue', qcLargerBetter=False, qcCutoff=0.01, rankAttr='score', rankLargerBetter=True)[source]¶ Import peptide spectrum matches (PSMs) from a percolator result file, generate
Siielements and store them in the specifiedsiiContainer. ImportedSiiare ranked according to a specified attribute and validated if they surpass a specified quality threshold.Parameters: - siiContainer – imported PSM results are added to this instance of
siiContainer - filelocation – file path of the percolator result file
- specfile – unambiguous identifier of a ms-run file. Is also used as a reference to other MasPy file containers.
- psmEngine – PSM search engine used for peptide spectrum matching
before percolator. For details see
readPercolatorResults(). Possible values are ‘comet’, ‘xtandem’, ‘msgf’. - qcAttr – name of the parameter to define a quality cut off. Typically this is some sort of a global false positive estimator (eg FDR)
- qcLargerBetter – bool, True if a large value for the
.qcAttrmeans a higher confidence. - qcCutOff – float, the quality threshold for the specifed
.qcAttr - rankAttr – name of the parameter used for ranking
Siiaccording to how well they match to a fragment ion spectrum, in the case when their are multipleSiipresent for the same spectrum. - rankLargerBetter – bool, True if a large value for the
.rankAttrmeans a better match to the fragment ion spectrum
For details on
Siiranking seeapplySiiRanking()For details on
Siiquality validation seeapplySiiQcValidation()- siiContainer – imported PSM results are added to this instance of
-
maspy.reader.prepareSiiImport(siiContainer, specfile, path, qcAttr, qcLargerBetter, qcCutoff, rankAttr, rankLargerBetter)[source]¶ Prepares the
siiContainerfor the import of peptide spectrum matching results. Adds entries tosiiContainer.containerand tosiiContainer.info.Parameters: - siiContainer – instance of
maspy.core.SiiContainer - specfile – unambiguous identifier of a ms-run file. Is also used as a reference to other MasPy file containers.
- path – folder location used by the
SiiContainerto save and load data to the hard disk. - qcAttr – name of the parameter to define a
Siiquality cut off. Typically this is some sort of a global false positive estimator, for example a ‘false discovery rate’ (FDR). - qcLargerBetter – bool, True if a large value for the
.qcAttrmeans a higher confidence. - qcCutOff – float, the quality threshold for the specifed
.qcAttr - rankAttr – name of the parameter used for ranking
Siiaccording to how well they match to a fragment ion spectrum, in the case when their are multipleSiipresent for the same spectrum. - rankLargerBetter – bool, True if a large value for the
.rankAttrmeans a better match to the fragment ion spectrum.
For details on
Siiranking seeapplySiiRanking()For details on
Siiquality validation seeapplySiiQcValidation()- siiContainer – instance of
-
maspy.reader.readMsgfMzidResults(filelocation, specfile=None)[source]¶ Reads MS-GF+ PSM results from a mzIdentML file and returns a list of
Siielements.Parameters: - filelocation – file path of the percolator result file
- specfile – optional, unambiguous identifier of a ms-run file. Is also
used as a reference to other MasPy file containers. If specified all
the
.specfileattribute of allSiiare set to this value, else it is read from the mzIdentML file.
Returns: [sii, sii, sii, ...]
-
maspy.reader.readPercolatorResults(filelocation, specfile, psmEngine)[source]¶ Reads percolator PSM results from a txt file and returns a list of
Siielements.Parameters: - filelocation – file path of the percolator result file
- specfile – unambiguous identifier of a ms-run file. Is also used as a reference to other MasPy file containers.
- psmEngine –
PSM PSM search engine used for peptide spectrum matching before percolator. This is important to specify, since the scanNr information is written in a different format by some engines. It might be necessary to adjust the settings for different versions of percolator or the PSM search engines used.
Possible values are ‘comet’, ‘xtandem’, ‘msgf’.
Returns: [sii, sii, sii, ...]
maspy.writer module¶
Provides the possibility to write a new mzML file from an MsrunContainer instance, which is the maspy representation of a specfile.
-
maspy.writer.writeMzml(specfile, msrunContainer, outputdir, spectrumIds=None, chromatogramIds=None, writeIndex=True)[source]¶ #TODO: docstring
Parameters: - specfile – #TODO docstring
- msrunContainer – #TODO docstring
- outputdir – #TODO docstring
- spectrumIds – #TODO docstring
- chromatogramIds – #TODO docstring
-
maspy.writer.xmlChromatogramFromCi(index, ci, compression='zlib')[source]¶ #TODO: docstring :param index: #TODO: docstring :param ci: #TODO: docstring :param compression: #TODO: docstring
Returns: #TODO: docstring
-
maspy.writer.xmlGenBinaryDataArrayList(binaryDataInfo, binaryDataDict, compression='zlib', arrayTypes=None)[source]¶ #TODO: docstring
Params binaryDataInfo: #TODO: docstring Params binaryDataDict: #TODO: docstring Params compression: #TODO: docstring Params arrayTypes: #TODO: docstring Returns: #TODO: docstring
-
maspy.writer.xmlGenPrecursorList(precursorList)[source]¶ #TODO: docstring
Params precursorList: #TODO: docstring Returns: #TODO: docstring
-
maspy.writer.xmlGenProductList(productList)[source]¶ #TODO: docstring
Params productList: #TODO: docstring Returns: #TODO: docstring
-
maspy.writer.xmlGenScanList(scanList, scanListParams)[source]¶ #TODO: docstring
Params scanList: #TODO: docstring Params scanListParams: #TODO: docstring Returns: #TODO: docstring
-
maspy.writer.xmlSpectrumFromSmi(index, smi, sai=None, compression='zlib')[source]¶ #TODO: docstring
Parameters: - index – The zero-based, consecutive index of the spectrum in the SpectrumList. (mzML specification)
- smi – a SpectrumMetadataItem instance
- sai – a SpectrumArrayItem instance, if none is specified no binaryDataArrayList is written
- compression – #TODO: docstring
Returns: #TODO: docstring
maspy.sil module¶
The class LabelDescriptor allows the specification of a labeling strategy with stable isotopic labels. It can then be used to determine the labeling state of a given peptide and calculate the expected mass of alternative labeling states.
-
class
maspy.sil.LabelDescriptor[source]¶ Bases:
objectDescribes a MS1 stable isotope label setup for quantification.
Variables: - labels – Contains a dictionary with all possible label states, keys are increasing integers starting from 0, which correspond to the different label states.
- excludingModifictions – bool, True if any label has specified excludingModifications
-
addLabel(aminoAcidLabels, excludingModifications=None)[source]¶ Adds a new labelstate.
Parameters: - aminoAcidsLabels –
Describes which amino acids can bear which labels. Possible keys are the amino acids in one letter code and ‘nTerm’, ‘cTerm’. Possible values are the modifications ids from
maspy.constants.aaModMassas strings or a list of strings. An example for one expected label at the n-terminus and two expectedlabels at each Lysine:{'nTerm': 'u:188', 'K': ['u:188', 'u:188']} - excludingModifications – optional, A Dectionary that describes
which modifications can prevent the addition of labels. Keys and
values have to be the modifications ids from
maspy.constants.aaModMass. The key specifies the modification that prevents the label modification specified by the value. For example for each modification ‘u:1’ that is present at an amino acid or terminus of a peptide the number of expected labels at this position is reduced by one:{'u:1':'u:188'}
- aminoAcidsLabels –
-
maspy.sil.expectedLabelPosition(peptide, labelStateInfo, sequence=None, modPositions=None)[source]¶ Returns a modification description of a certain label state of a peptide.
Parameters: - peptide – Peptide sequence used to calculat the expected label state modifications
- labelStateInfo – An entry of
LabelDescriptor.labelsthat describes a label state - sequence – unmodified amino acid sequence of :var:`peptide`, if None
it is generated by
maspy.peptidemethods.removeModifications() - modPositions – dictionary describing the modification state of
“peptide”, if None it is generated by
maspy.peptidemethods.returnModPositions()
Returns: - {sequence position: sorted list of expected label modifications
on that position, ...
}
-
maspy.sil.modAminoacidsFromLabelInfo(labelDescriptor)[source]¶ Returns a set of all amino acids and termini which can bear a label, as described in “labelDescriptor”.
Parameters: labelDescriptor – LabelDescriptordescribes the label setup of an experimentReturns: #TODO: docstring
-
maspy.sil.modSymbolsFromLabelInfo(labelDescriptor)[source]¶ Returns a set of all modiciation symbols which were used in the labelDescriptor
Parameters: labelDescriptor – LabelDescriptordescribes the label setup of an experimentReturns: #TODO: docstring
-
maspy.sil.returnLabelState(peptide, labelDescriptor, labelSymbols=None, labelAminoacids=None)[source]¶ Calculates the label state of a given peptide for the label setup described in labelDescriptor
Parameters: - peptide – peptide which label state should be calcualted
- labelDescriptor –
LabelDescriptor, describes the label setup of an experiment. - labelSymbols – modifications that show a label, as returned by
modSymbolsFromLabelInfo(). - labelAminoacids – amino acids that can bear a label, as returned by
modAminoacidsFromLabelInfo().
Returns: integer that shows the label state: >=0: predicted label state of the peptide
-1: peptide sequence can’t bear any labelState modifications -2: peptide modifications don’t fit to any predicted labelState -3: peptide modifications fit to a predicted labelState, but not all
predicted labelStates are distinguishable
-
maspy.sil.returnLabelStateMassDifferences(peptide, labelDescriptor, labelState=None, sequence=None)[source]¶ Calculates the mass difference for alternative possible label states of a given peptide. See also
LabelDescriptor,returnLabelState()Parameters: - peptide – Peptide to calculate alternative label states
- labelDescriptor –
LabelDescriptordescribes the label setup of an experiment - labelState – label state of the peptide, if None it is calculated by
returnLabelState() - sequence – unmodified amino acid sequence of the “peptide”, if None
it is generated by
maspy.peptidemethods.removeModifications()
Returns: {alternativeLabelSate: massDifference, ...} or {} if the peptide label state is -1.
Note
The massDifference plus the peptide mass is the expected mass of an alternatively labeled peptide
maspy.xml module¶
#TODO: module description
-
class
maspy.xml.MzmlReader(mzmlPath)[source]¶ Bases:
object#TODO: docstring
Variables: - mzmlPath – #TODO: docstring
- metadataNode – #TODO: docstring
- chromatogramList – #TODO: docstring
-
maspy.xml.binaryDataArrayTypes= {'MS:1000617': 'lambda', 'MS:1000595': 'rt', 'MS:1000516': 'z', 'MS:1000820': 'flow', 'MS:1000821': 'pressure', 'MS:1000514': 'mz', 'MS:1000822': 'temperature', 'MS:1000515': 'i', 'MS:1000786': 'non-standard', 'MS:1000517': 'sn'}¶ #TODO: docstring
-
maspy.xml.clearParsedElements(element)[source]¶ Deletes an element and all linked parent elements.
This function is used to save memory while iteratively parsing an xml file by removing already processed elements.
Parameters: element – #TODO docstring
-
maspy.xml.clearTag(tag)[source]¶ #TODO: docstring eg “{http://psi.hupo.org/ms/mzml}mzML” returns “mzML”
Parameters: tag – #TODO docstring Returns:
-
maspy.xml.cvParamFromDict(attributes)[source]¶ Python representation of a mzML cvParam = tuple(accession, value, unitAccession).
Parameters: attributes – #TODO: docstring Returns: #TODO: docstring
-
maspy.xml.decodeBinaryData(binaryData, arrayLength, bitEncoding, compression)[source]¶ Function to decode a mzML byte array into a numpy array. This is the inverse function of
encodeBinaryData(). Concept inherited frompymzml.spec.Spectrum._decode()of the python library pymzML.Parameters: - binaryData – #TODO: docstring
- arrayLength – #TODO: docstring
- binEncoding – #TODO: docstring
- compression – #TODO: docstring
Returns: #TODO: docstring
-
maspy.xml.encodeBinaryData(dataArray, bitEncoding, compression)[source]¶ Function to encode a
numpy.arrayinto a mzML byte array. This is the inverse function ofdecodeBinaryData().Parameters: - dataArray – #TODO: docstring
- bitEncoding – #TODO: docstring
- compression – #TODO: docstring
Returns: #TODO: docstring
-
maspy.xml.extractBinaries(binaryDataArrayList, arrayLength)[source]¶ #TODO: docstring
Parameters: - binaryDataArrayList – #TODO: docstring
- arrayLength – #TODO: docstring
Returns: #TODO: docstring
-
maspy.xml.extractParams(xmlelement)[source]¶ #TODO docstring
Parameters: xmlelement – #TODO docstring Returns: #TODO docstring
-
maspy.xml.findBinaryDataType(params)[source]¶ #TODO: docstring from: http://www.peptideatlas.org/tmp/mzML1.1.0.html#binaryDataArray a binaryDataArray “MUST supply a child term of MS:1000518 (binary data type) only once”
Parameters: params – #TODO: docstring Returns: #TODO: docstring
-
maspy.xml.findParam(params, targetValue)[source]¶ Returns a param entry (cvParam or userParam) in a list of params if its ‘accession’ (cvParam) or ‘name’ (userParam) matches the targetValue. return: cvParam, userParam or None if no matching param was found
Parameters: - params – #TODO: docstring
- targetValue – #TODO: docstring
Returns: #TODO: docstring
-
maspy.xml.getParam(xmlelement)[source]¶ Converts an mzML xml element to a param tuple.
Parameters: xmlelement – #TODO docstring Returns: a param tuple or False if the xmlelement is not a parameter (‘userParam’, ‘cvParam’ or ‘referenceableParamGroupRef’)
-
maspy.xml.interpretBitEncoding(bitEncoding)[source]¶ Returns a floattype string and a numpy array type.
Parameters: bitEncoding – Must be either ‘64’ or ‘32’ Returns: (floattype, numpyType)
-
maspy.xml.recClearTag(element)[source]¶ Applies maspy.xml.clearTag() to the tag attribute of the “element” and recursively to all child elements.
Parameters: element – an :instance:`xml.etree.Element`
-
maspy.xml.recCopyElement(oldelement)[source]¶ Generates a copy of an xml element and recursively of all child elements.
Parameters: oldelement – an instance of lxml.etree._Element Returns: a copy of the “oldelement” Warning
doesn’t copy
.textor.tailof xml elements
-
maspy.xml.recRemoveTreeFormating(element)[source]¶ Removes whitespace characters, which are leftovers from previous xml formatting.
Parameters: element – an instance of lxml.etree._Element str.strip() is applied to the “text” and the “tail” attribute of the element and recursively to all child elements.
-
maspy.xml.refParamGroupFromDict(attributes)[source]¶ Python representation of a mzML referencableParamGroup = (‘ref’, ref)
Parameters: attributes – #TODO: docstring Returns: #TODO: docstring Note
altough the mzML element referencableParamGroups is imported, its utilization is currently not implemented in MasPy.
-
maspy.xml.userParamFromDict(attributes)[source]¶ Python representation of a mzML userParam = tuple(name, value, unitAccession, type)
Parameters: attributes – #TODO: docstring Returns: #TODO: docstring
-
maspy.xml.xmlAddParams(parentelement, params)[source]¶ Generates new mzML parameter xml elements and adds them to the ‘parentelement’ as xml children elements.
Parameters: - parentelement –
xml.etree.Element, an mzML element - params – a list of mzML parameter tuples (‘cvParam’, ‘userParam’ or ‘referencableParamGroup’)
- parentelement –