An automata approach to match gapped sequence tags against protein database

Authors:
Yonghua Han;Bin Ma;Kaizhong Zhang
Affiliations:
Department of Computer Science, University of Western Ontario, London, Ontario, Canada;Department of Computer Science, University of Western Ontario, London, Ontario, Canada;Department of Computer Science, University of Western Ontario, London, Ontario, Canada
Venue:
CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
Year:
2004

Citing 4
Cited 0

Mutation-tolerant protein identification by mass-spectrometry

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Fast and Sensitive Alignment of Large Genomic Sequences

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
An effective algorithm for the peptide de novo sequencing from MS/MS spectrum

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tandem mass spectrometry (MS/MS) is the most important method for the peptide and protein identification. One approach to interpret the MS/MS data is de novo sequencing, which is becoming more and more accurate and important. However De novo sequencing usually can only confidently determine partial sequences, while the undetermined parts are represented by “mass gaps”. We call such a partially determined sequence a gapped sequence tag. When a gapped sequence tag is searched in a database for protein identification, the determined parts should match the database sequence exactly, while each mass gap should match a substring of amino acids whose masses total up to the value of the mass gap. In such a case, the standard string matching algorithm does not work any more. In this paper, we present a new efficient algorithm to find the matches of gapped sequence tags in a protein database.