Blocked pattern matching problem and its applications in proteomics

Authors:
Julio Ng;Amihood Amir;Pavel A. Pevzner
Affiliations:
Bioinformatics and Systems Biology Program, University of California San Diego;Department of Computer Science, Bar-Ilan University and Department of Computer Science, Johns Hopkins University;Department of Computer Science and Eng., University of California San Diego
Venue:
RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
Year:
2011

Citing 15
Cited 1

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Approximate String Matching: A Simpler Faster Algorithm

SIAM Journal on Computing
Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry

WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Pattern Matching for Spatial Point Sets

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Pattern matching with address errors: rearrangement distances

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Large-scale prokaryotic gene prediction and comparison to genome annotation

Bioinformatics
Approximate String Matching with Address Bit Errors

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Interchange rearrangement: The element-cost model

Theoretical Computer Science
On the cost of interchange rearrangement in strings

ESA'07 Proceedings of the 15th annual European conference on Algorithms
Efficient computations of l1and l∞rearrangement distances

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Approximate string matching with stuck address bits

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Asynchronous pattern matching

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Fast set intersection and two-patterns matching

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Deterministic length reduction: fast convolution in sparse data and applications

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

UniNovo: a universal tool for de novo peptide sequencing

RECOMB'13 Proceedings of the 17th international conference on Research in Computational Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Matching a mass spectrum against a text (a key computational task in proteomics) is slow since the existing text indexing algorithms (with search time independent of the text size) are not applicable in the domain of mass spectrometry. As a result, many important applications (e.g., searches for mutated peptides) are prohibitively timeconsuming and even the standard search for non-mutated peptides is becoming too slow with recent advances in high-throughput genomics and proteomics technologies. We introduce a new paradigm - the Blocked Pattern Matching (BPM) Problem - that models peptide identification. BPM corresponds to matching a pattern against a text (over the alphabet of integers) under the assumption that each symbol a in the pattern can match a block of consecutive symbols in the text with total sum a. BPM opens a new, still unexplored, direction in combinatorial pattern matching and leads to the Mutated BPM (modeling identification of mutated peptides) and Fused BPM (modeling identification of fused peptides in tumor genomes). We illustrate how BPM algorithms solve problems that are beyond the reach of existing proteomics tools.