Input form (basic)
Mass spectra
Upload the MS spectra files : PAMPA can process MALDI-TOF and MALDI-FTICR spectra. In all cases, we recommend deisotoping the mass spectra before processing them.
It can recognize the following formats, by the extension of the file name:
- CSV format: It consists of two columns. The first column is designated for mass (m/z), and the second column records intensity (I). Columns are separated by either a comma (',') or a semicolon (';'). The initial row serves as the header.
- MGF format: Mascot Generic Format
- mzML format: see https://www.psidev.info/mzML
User can upload several files. It is also possible to provide a ZIP archive containing all files.
Mass error : The error margin is related to the resolution of the mass spectrometer, that is its ability to distinguish closely spaced peaks. We employ it to set an upper bound on the deviation between a peak and the theoretical mass of the associated peptide.
- Optimize for MALDI-TOF spectra: This option corresponds to a value of 50 ppm.
- Optimize for MALDI-FTICR spectra: This option corresponds to a value of 5 ppm.
- Custom value in ppm: Enter any value between 1 and 1000
- Custom value in Daltons : Enter any value between 0.002 and 0.998
Results
- Only optimal results : with this option, PAMPA identifies the species with the smallest P-value for each mass spectrum.
-
Near-optimal results within a suboptimality percentage : allows to obtain also near-optimal solutions. For that, you can set the suboptimality range as a percentage from 0 to 100, with the default being 100 (corresponding to solutions with the highest number of marker peptides).
For example, if the optimal solutions has 11 marker peptides, a value of 80 will provide solutions with 9 markers or more. - All results within a suboptimality percentage : this option is linked to the previous option and modifies its behavior. When the previous option is used alone, it generates only near-optimal solutions that are not included in any other solution. This option makes the program to compute all solutions, even those that are included in other solutions.
Advanced analysis
Peptide tables
Peptide markers are organized within peptide tables, which are TSV files where each column corresponds to a field. Twelve fields are recognized by the program.
- Rank : Taxonomic rank
- Taxid : Taxonomic identifier
- Taxon name : Scientific name
- Sequence : Marker peptide sequence
- PTM : Description of post-translational modifications applied to the marker peptide (see PTM description section)
- Name : Marker name
- Mass : Peptide mass
- Gene : Gene name, e.g., COL1A1
- SeqId : Sequence identifier(s) of the protein sequence from which the marker peptide is derived
- Begin : Start position of the peptide marker within the protein sequence
- End : End position of the peptide marker within the protein sequence
- Comment : Additional comments about the marker
The first row of the file should contain column headings.
Most of these fields are optional and are here for reference. The following information is mandatory:
- You must provide a taxid for the peptide marker. Rank and taxon names are included primarily to enhance the clarity of results.
- You should furnish either a sequence, possibly with a PTM description, or a mass for your marker peptide. If the sequence is provided without a mass, the program will automatically compute the mass from it. To do so, it will utilize either the PTM description (when available) or infer potential PTMs from the sequence.
Lastly, you have the option to include additional fields (i.e., extra columns) for your own purposes. These fields will be disregarded by PAMPA.
Where to find peptide tables, how to generate them ? An example of peptide table for mammals is accessible here. You can manually edit these peptide table files or create your own using any spreadsheet software and opting for the TSV export format.
Alternatively, PAMPA CRAFT offers automated methods for generating peptide tables.
FASTA sequences
PAMPA processes amino-acid sequences. For that, it uses the standard FASTA format with UniprotKB-like header. The first line starts with a greater-than character (>) followed by some sequence identifier (SeqID), which is provided for informational purposes and can be customized by the user. Additionally, this line must contain three mandatory fields :
- OS: scientific name of the organism
- OX: taxonomomic identifier of the organism, such as assigned by the NCBI
- GN: gene name
The other lines are the sequence representation, with one letter per amino acid.
For example:
>P02453 OS=Bos taurus OX=9913 GN=COL1A1
MFSFVDLRLLLLLAATALLTHGQEEGQEEGQEEDIPPVTCVQNGLRYHDRDVWKPVPCQI
CVCDNGNVLCDDVICDELKDCPNAKVPTDECCPVCPEGQESPTDQETTGVEGPKGDTGPR
GPRGPAGPPGRDGIPGQPGLPGPPGPPGPPGPPGLGGNFAPQLSYGYDEKSTGISVPGPM
GPSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRP
GERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQM
Taxonomy
The program offers the optional possibility to add taxonomic information to the species identification. In this case, you can use the file provided or submit your own.
The taxonomy must be in the form of a Tab-Separated Values (TSV) file comprising five columns: Taxid, Common name, Scientific name, Parent (taxid), and Rank (species, genus, etc.). You can obtain this type of file directly from UniProt (https://www.uniprot.org/taxonomy) by following these steps:
- Use the search bar to find your desired clade, entering its common name, scientific name, or taxid.
- Select the clade of interest and click on 'Browse all descendants.'
- Locate the 'download' link.
- Choose the TSV format and customize the columns in the following order: Common name, Scientific name, Parent, and Rank.
- Proceed to download the taxonomy file.
Exploring results
For each spectrum, the output file will give the best assignment, based on the highest number of marker peptides. It contains the following information :
- Peaks from the spectrum that match the marker petides
- Score : the total number of marker peptides
- Assignment : largest subtree of the taxonomy that is compatible with the marker peptides found
- Rank : taxonomic rank of the assignment (e.g. species, genus, family)
- Species : the list of species supporting the assignment
Two other accompanying files are automatically created.
-
detail_
(TSV file): this file contains the detail of the assignment (which markers are found for which species). It also provides the intensity of the peaks used in the assignment. -
report_
(TXT file): this file contains a report on the run's inputs (number of mass spectra, number of species tested, parameters...)
Additional information
PTM description
Peptide tables include a field labeled PTM, which is utilized to describe the post-translational modifications (PTMs) applied to the corresponding peptide. PAMPA recognizes three types of PTMs :
- Oxylation of prolines (indicated by the single-letter code 'O')
- Deamidation of asparagine and glutamine (indicated by the single-letter code 'D')
- Phosphorylation of serine, threonine, and tyrosine (indicated by the single-letter code 'P')
The PTM description is a concise representation of the number of oxylations, deamidations and phosphorylations necessary to compute the mass of a peptide sequence. For instance, '2O1D' signifies two oxyprolines and one deamidation, '1P4O' represents one phosphorylation and four oxyprolines, '2O' corresponds to two oxyprolines without any deamidation and phosphorylation. When no PTM applies, the description should be '0O', or '0D', etc.
When the PTM description field is left empty in the peptide table, it signifies that PTMs are not specified. In such cases, PAMPA directly infers PTMs based on two rules:
- No deamidation and phosphorylation are added.
- The number of oxyprolines is determined empirically using the following formula: Let 'p' represent the total number of prolines in the peptide, and 'pp' represent the number of prolines involved in the pattern 'GxP'. If the difference 'p-pp' is less than 3, then 'pp' oxyprolines are applied. If 'p-pp' is 3 or greater, 'pp' oxyprolines and 'pp+1' oxyprolines are applied.