Peptide sequence tags for fast database search in mass-spectrometry

Authors:
Ari Frank;Stephen Tanner;Pavel Pevzner
Affiliations:
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA;Department of Bioinformatics, University of California, San Diego, La Jolla, CA;Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA
Venue:
RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
Year:
2005

Citing 4
Cited 2

Efficient string matching: an aid to bibliographic search

Communications of the ACM
Introduction to Algorithms

Introduction to Algorithms
On de novo interpretation of tandem mass spectra for peptide identification

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
PPM-Chain De novo Peptide Identification Program Comparable in Performance to Sequest

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference

Identification of Post-Translational Modifications via Blind Search of Mass-Spectra

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Indexing and searching a mass spectrometry database

Algorithms and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Filtration techniques, in the form of rapid elimination of candidate sequences while retaining the true one, are key ingredients of database searches in genomics. Although SEQUEST and Mascot are sometimes referred to as “BLAST for mass-spectrometry”, the key algorithmic idea of BLAST (filtration) was never implemented in these tools. As a result MS/MS protein identification tools are becoming too time-consuming for many applications including search for post-translationally modified peptides. Moreover, matching millions of spectra against all known proteins will soon make these tools too slow in the same way that “genome vs. genome” comparisons instantly made BLAST too slow. We describe the development of filters for MS/MS database searches that dramatically reduce the running time and effectively remove the bottlenecks in searching the huge space of protein modifications. Our approach, based on a probability model for determining the accuracy of sequence tags, achieves superior results compared to GutenTag, a popular tag generation algorithm.