Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry

Authors:
Nathan Edwards;Ross Lippert
Affiliations:
-;-
Venue:
WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Year:
2002

Citing 4
Cited 3

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Mutation-tolerant protein identification by mass-spectrometry

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Reducing the space requirement of suffix trees

Software—Practice & Experience
A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms

Finding submasses in weighted strings with Fast Fourier Transform

Discrete Applied Mathematics
Markov additive chains and applications to fragment statistics for peptide mass fingerprinting

RECOMB'06 Proceedings of the joint 2006 satellite conference on Systems biology and computational proteomics
Blocked pattern matching problem and its applications in proteomics

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typically relies on amino-acid sequence databases to provide a set of biologically relevant peptides to examine. A key subproblem, then, for amino-acid sequence database search engines that analyze tandem mass spectra is to efficiently generate all the peptide candidates from a sequence database with mass equal to one of a large set of observed peptide masses. We demonstrate that to solve the problem efficiently, we must deal with substring redundancy in the amino-acid sequence database and focus our attention on looking up the observed peptide masses quickly. We show that it is possible, with some preprocessing and memory overhead, to solve the peptide candidate generation problem in time asymptotically proportional to the size of the sequence database and the number of peptide candidates output.