An automata approach to match gapped sequence tags against protein database

  • Authors:
  • Yonghua Han;Bin Ma;Kaizhong Zhang

  • Affiliations:
  • Department of Computer Science, University of Western Ontario, London, Ontario, Canada;Department of Computer Science, University of Western Ontario, London, Ontario, Canada;Department of Computer Science, University of Western Ontario, London, Ontario, Canada

  • Venue:
  • CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tandem mass spectrometry (MS/MS) is the most important method for the peptide and protein identification. One approach to interpret the MS/MS data is de novo sequencing, which is becoming more and more accurate and important. However De novo sequencing usually can only confidently determine partial sequences, while the undetermined parts are represented by “mass gaps”. We call such a partially determined sequence a gapped sequence tag. When a gapped sequence tag is searched in a database for protein identification, the determined parts should match the database sequence exactly, while each mass gap should match a substring of amino acids whose masses total up to the value of the mass gap. In such a case, the standard string matching algorithm does not work any more. In this paper, we present a new efficient algorithm to find the matches of gapped sequence tags in a protein database.