Automatic Quality Assessment of Peptide Tandem Mass Spectra

  • Authors:
  • Marshall Bern;David Goldberg;W. Hayes Mcdonald;John R. Yates

  • Affiliations:
  • Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA;Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA;The Scripps Research Institute, 10440 North Torrey Pines Road, La Jolla, CA 92037, USA;The Scripps Research Institute, 10440 North Torrey Pines Road, La Jolla, CA 92037, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: A powerful proteomics methodology couples high-performance liquid chromatography (HPLC) with tandem mass spectrometry and database-search software, such as SEQUEST. Such a set-up, however, produces a large number of spectra, many of which are of too poor quality to be useful. Hence a filter that eliminates poor spectra before the database search can significantly improve throughput and robustness. Moreover, spectra judged to be of high quality, but that cannot be identified by database search, are prime candidates for still more computationally intensive methods, such as de novo sequencing or wider database searches including post-translational modifications. Results: We report on two different approaches to assessing spectral quality prior to identification: binary classification, which predicts whether or not SEQUEST will be able to make an identification, and statistical regression, which predicts a more universal quality metric involving the number of b- and y-ion peaks. The best of our binary classifiers can eliminate over 75% of the unidentifiable spectra while losing only 10% of the identifiable spectra. Statistical regression can pick out spectra of modified peptides that can be identified by a de novo program but not by SEQUEST. In a section of independent interest, we discuss intensity normalization of mass spectra.