Automatic Quality Assessment of Peptide Tandem Mass Spectra

Authors:
Marshall Bern;David Goldberg;W. Hayes Mcdonald;John R. Yates
Affiliations:
Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA;Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA;The Scripps Research Institute, 10440 North Torrey Pines Road, La Jolla, CA 92037, USA;The Scripps Research Institute, 10440 North Torrey Pines Road, La Jolla, CA 92037, USA
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 6

Peptide Charge State Determination for Low-Resolution Tandem Mass Spectra

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
An extended Markov blanket approach to proteomic biomarker detection from high-resolution mass spectrometry data

IEEE Transactions on Information Technology in Biomedicine
Charge state determination of peptide tandem mass spectra using support vector machine (SVM)

IEEE Transactions on Information Technology in Biomedicine - Special section on new and emerging technologies in bioinformatics and bioengineering
SVM-RFE based feature selection for tandem mass spectrum quality assessment

International Journal of Data Mining and Bioinformatics
EigenMS: de novo analysis of peptide tandem mass spectra by spectral graph partitioning

RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
Neural network-based method for peptide identification in proteomics

ITIB'12 Proceedings of the Third international conference on Information Technologies in Biomedicine

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: A powerful proteomics methodology couples high-performance liquid chromatography (HPLC) with tandem mass spectrometry and database-search software, such as SEQUEST. Such a set-up, however, produces a large number of spectra, many of which are of too poor quality to be useful. Hence a filter that eliminates poor spectra before the database search can significantly improve throughput and robustness. Moreover, spectra judged to be of high quality, but that cannot be identified by database search, are prime candidates for still more computationally intensive methods, such as de novo sequencing or wider database searches including post-translational modifications. Results: We report on two different approaches to assessing spectral quality prior to identification: binary classification, which predicts whether or not SEQUEST will be able to make an identification, and statistical regression, which predicts a more universal quality metric involving the number of b- and y-ion peaks. The best of our binary classifiers can eliminate over 75% of the unidentifiable spectra while losing only 10% of the identifiable spectra. Statistical regression can pick out spectra of modified peptides that can be identified by a de novo program but not by SEQUEST. In a section of independent interest, we discuss intensity normalization of mass spectra.