Blocked pattern matching problem and its applications in proteomics

  • Authors:
  • Julio Ng;Amihood Amir;Pavel A. Pevzner

  • Affiliations:
  • Bioinformatics and Systems Biology Program, University of California San Diego;Department of Computer Science, Bar-Ilan University and Department of Computer Science, Johns Hopkins University;Department of Computer Science and Eng., University of California San Diego

  • Venue:
  • RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Matching a mass spectrum against a text (a key computational task in proteomics) is slow since the existing text indexing algorithms (with search time independent of the text size) are not applicable in the domain of mass spectrometry. As a result, many important applications (e.g., searches for mutated peptides) are prohibitively timeconsuming and even the standard search for non-mutated peptides is becoming too slow with recent advances in high-throughput genomics and proteomics technologies. We introduce a new paradigm - the Blocked Pattern Matching (BPM) Problem - that models peptide identification. BPM corresponds to matching a pattern against a text (over the alphabet of integers) under the assumption that each symbol a in the pattern can match a block of consecutive symbols in the text with total sum a. BPM opens a new, still unexplored, direction in combinatorial pattern matching and leads to the Mutated BPM (modeling identification of mutated peptides) and Fused BPM (modeling identification of fused peptides in tumor genomes). We illustrate how BPM algorithms solve problems that are beyond the reach of existing proteomics tools.