Peptide Charge State Determination for Low-Resolution Tandem Mass Spectra
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
IEEE Transactions on Information Technology in Biomedicine
Charge state determination of peptide tandem mass spectra using support vector machine (SVM)
IEEE Transactions on Information Technology in Biomedicine - Special section on new and emerging technologies in bioinformatics and bioengineering
SVM-RFE based feature selection for tandem mass spectrum quality assessment
International Journal of Data Mining and Bioinformatics
EigenMS: de novo analysis of peptide tandem mass spectra by spectral graph partitioning
RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
Neural network-based method for peptide identification in proteomics
ITIB'12 Proceedings of the Third international conference on Information Technologies in Biomedicine
Hi-index | 3.84 |
Motivation: A powerful proteomics methodology couples high-performance liquid chromatography (HPLC) with tandem mass spectrometry and database-search software, such as SEQUEST. Such a set-up, however, produces a large number of spectra, many of which are of too poor quality to be useful. Hence a filter that eliminates poor spectra before the database search can significantly improve throughput and robustness. Moreover, spectra judged to be of high quality, but that cannot be identified by database search, are prime candidates for still more computationally intensive methods, such as de novo sequencing or wider database searches including post-translational modifications. Results: We report on two different approaches to assessing spectral quality prior to identification: binary classification, which predicts whether or not SEQUEST will be able to make an identification, and statistical regression, which predicts a more universal quality metric involving the number of b- and y-ion peaks. The best of our binary classifiers can eliminate over 75% of the unidentifiable spectra while losing only 10% of the identifiable spectra. Statistical regression can pick out spectra of modified peptides that can be identified by a de novo program but not by SEQUEST. In a section of independent interest, we discuss intensity normalization of mass spectra.