Bonsai :: Bioinformatics Software Server

Input form (basic)

Mass spectra

Upload the MS spectra files : PAMPA can process MALDI-TOF and MALDI-FTICR spectra. In all cases, we recommend deisotoping the mass spectra before processing them.
It can recognize the following formats, by the extension of the file name:

CSV format: It consists of two columns. The first column is designated for mass (m/z), and the second column records intensity (I). Columns are separated by either a comma (',') or a semicolon (';'). The initial row serves as the header.
MGF format: Mascot Generic Format
mzML format: see https://www.psidev.info/mzML

User can upload several files. It is also possible to provide a ZIP archive containing all files.

Mass error : The error margin is related to the resolution of the mass spectrometer, that is its ability to distinguish closely spaced peaks. We employ it to set an upper bound on the deviation between a peak and the theoretical mass of the associated peptide.

Optimize for MALDI-TOF spectra: This option corresponds to a value of 50 ppm.
Optimize for MALDI-FTICR spectra: This option corresponds to a value of 5 ppm.
Custom value in ppm: Enter any value between 1 and 1000
Custom value in Daltons : Enter any value between 0.002 and 0.998

Results

Only optimal results : with this option, PAMPA identifies the species with the smallest P-value for each mass spectrum.
Near-optimal results within a suboptimality percentage : allows to obtain also near-optimal solutions. For that, you can set the suboptimality range as a percentage from 0 to 100, with the default being 100 (corresponding to solutions with the highest number of marker peptides).
For example, if the optimal solutions has 11 marker peptides, a value of 80 will provide solutions with 9 markers or more.
All results within a suboptimality percentage : this option is linked to the previous option and modifies its behavior. When the previous option is used alone, it generates only near-optimal solutions that are not included in any other solution. This option makes the program to compute all solutions, even those that are included in other solutions.

Advanced analysis

Peptide tables

Peptide markers are organized within peptide tables, which are TSV files where each column corresponds to a field. Twelve fields are recognized by the program.

Rank : Taxonomic rank
Taxid : Taxonomic identifier
Taxon name : Scientific name
Sequence : Marker peptide sequence
PTM : Description of post-translational modifications applied to the marker peptide (see PTM description section)
Name : Marker name
Mass : Peptide mass
Gene : Gene name, e.g., COL1A1
SeqId : Sequence identifier(s) of the protein sequence from which the marker peptide is derived
Begin : Start position of the peptide marker within the protein sequence
End : End position of the peptide marker within the protein sequence
Comment : Additional comments about the marker

The first row of the file should contain column headings.

Most of these fields are optional and are here for reference. The following information is mandatory:

You must provide a taxid for the peptide marker. Rank and taxon names are included primarily to enhance the clarity of results.
You should furnish either a sequence, possibly with a PTM description, or a mass for your marker peptide. If the sequence is provided without a mass, the program will automatically compute the mass from it. To do so, it will utilize either the PTM description (when available) or infer potential PTMs from the sequence.

Lastly, you have the option to include additional fields (i.e., extra columns) for your own purposes. These fields will be disregarded by PAMPA.

Where to find peptide tables, how to generate them ? An example of peptide table for mammals is accessible here. You can manually edit these peptide table files or create your own using any spreadsheet software and opting for the TSV export format.
Alternatively, PAMPA CRAFT offers automated methods for generating peptide tables.

FASTA sequences

PAMPA processes amino-acid sequences. For that, it uses the standard FASTA format with UniprotKB-like header. The first line starts with a greater-than character (>) followed by some sequence identifier (SeqID), which is provided for informational purposes and can be customized by the user. Additionally, this line must contain three mandatory fields :

OS: scientific name of the organism
OX: taxonomomic identifier of the organism, such as assigned by the NCBI
GN: gene name

The other lines are the sequence representation, with one letter per amino acid.

For example:

   >P02453 OS=Bos taurus OX=9913 GN=COL1A1 

   MFSFVDLRLLLLLAATALLTHGQEEGQEEGQEEDIPPVTCVQNGLRYHDRDVWKPVPCQI

   CVCDNGNVLCDDVICDELKDCPNAKVPTDECCPVCPEGQESPTDQETTGVEGPKGDTGPR

   GPRGPAGPPGRDGIPGQPGLPGPPGPPGPPGPPGLGGNFAPQLSYGYDEKSTGISVPGPM

   GPSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRP

   GERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQM

Taxonomy

The program offers the optional possibility to add taxonomic information to the species identification. In this case, you can use the file provided or submit your own.

The taxonomy must be in the form of a Tab-Separated Values (TSV) file comprising five columns: Taxid, Common name, Scientific name, Parent (taxid), and Rank (species, genus, etc.). You can obtain this type of file directly from UniProt (https://www.uniprot.org/taxonomy) by following these steps:

Use the search bar to find your desired clade, entering its common name, scientific name, or taxid.
Select the clade of interest and click on 'Browse all descendants.'
Locate the 'download' link.
Choose the TSV format and customize the columns in the following order: Common name, Scientific name, Parent, and Rank.
Proceed to download the taxonomy file.

Exploring results

For each spectrum, the output file will give the best assignment, based on the highest number of marker peptides. It contains the following information :

Peaks from the spectrum that match the marker petides
Score : the total number of marker peptides
Assignment : largest subtree of the taxonomy that is compatible with the marker peptides found
Rank : taxonomic rank of the assignment (e.g. species, genus, family)
Species : the list of species supporting the assignment

Two other accompanying files are automatically created.

detail_ (TSV file): this file contains the detail of the assignment (which markers are found for which species). It also provides the intensity of the peaks used in the assignment.
report_ (TXT file): this file contains a report on the run's inputs (number of mass spectra, number of species tested, parameters...)

Additional information

PTM description

Peptide tables include a field labeled PTM, which is utilized to describe the post-translational modifications (PTMs) applied to the corresponding peptide. PAMPA recognizes three types of PTMs :

Oxylation of prolines (indicated by the single-letter code 'O')
Deamidation of asparagine and glutamine (indicated by the single-letter code 'D')
Phosphorylation of serine, threonine, and tyrosine (indicated by the single-letter code 'P')

The PTM description is a concise representation of the number of oxylations, deamidations and phosphorylations necessary to compute the mass of a peptide sequence. For instance, '2O1D' signifies two oxyprolines and one deamidation, '1P4O' represents one phosphorylation and four oxyprolines, '2O' corresponds to two oxyprolines without any deamidation and phosphorylation. When no PTM applies, the description should be '0O', or '0D', etc.

When the PTM description field is left empty in the peptide table, it signifies that PTMs are not specified. In such cases, PAMPA directly infers PTMs based on two rules:

No deamidation and phosphorylation are added.
The number of oxyprolines is determined empirically using the following formula: Let 'p' represent the total number of prolines in the peptide, and 'pp' represent the number of prolines involved in the pattern 'GxP'. If the difference 'p-pp' is less than 3, then 'pp' oxyprolines are applied. If 'p-pp' is 3 or greater, 'pp' oxyprolines and 'pp+1' oxyprolines are applied.

Pampa