A fast coarse filtering method for peptide identification by mass spectrometry

  • Authors:
  • Smriti R. Ramakrishnan;Rui Mao;Aleksey A. Nakorchevskiy;John T. Prince;Willard S. Willard;Weijia Xu;Edward M. Marcotte;Daniel P. Miranker

  • Affiliations:
  • Department of Computer Sciences, The University of Texas at Austin Austin, Texas 78712, USA;Department of Computer Sciences, The University of Texas at Austin Austin, Texas 78712, USA;Department of Chemistry and Biochemistry, The University of Texas at Austin Austin, Texas 78712, USA;Institute for Cellular and Molecular Biology, The University of Texas at Austin Austin, Texas 78712, USA;Department of Computer Sciences, The University of Texas at Austin Austin, Texas 78712, USA;Department of Computer Sciences, The University of Texas at Austin Austin, Texas 78712, USA;Institute for Cellular and Molecular Biology, The University of Texas at Austin Austin, Texas 78712, USA;Department of Computer Sciences, The University of Texas at Austin Austin, Texas 78712, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: We reformulate the problem of comparing mass-spectra by mapping spectra to a vector space model. Our search method leverages a metric space indexing algorithm to produce an initial candidate set, which can be followed by any fine ranking scheme. Results: We consider three distance measures integrated into a multi-vantage point index structure. Of these, a semi-metric fuzzy-cosine distance using peptide precursor mass constraints performs the best. The index acts as a coarse, lossless filter with respect to the SEQUEST and ProFound scoring schemes, reducing the number of distance computations and returned candidates for fine filtering to about 0.5% and 0.02% of the database respectively. The fuzzy cosine distance term improves specificity over a peptide precursor mass filter, reducing the number of returned candidates by an order of magnitude. Run time measurements suggest proportional speedups in overall search times. Using an implementation of ProFound's Bayesian score as an example of a fine filter on a test set of Escherichia coli protein fragmentation spectra, the top results of our sample system are consistent with that of SEQUEST. Contact: smriti@cs.utexas.edu Supplementary information: Supplementary data are available at Bioinformatics online.