Fast search algorithms for position specific scoring matrices

Authors:
Cinzia Pizzi;Pasi Rastas;Esko Ukkonen
Affiliations:
Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland;Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland;Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland
Venue:
BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Year:
2007

Citing 7
Cited 6

Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Text algorithms

Text algorithms
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Accelerating Protein Classification Using Suffix Trees

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Using sequence compression to speedup probabilistic profile matching

Bioinformatics
Large scale matching for position weight matrices

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching

Fast profile matching algorithms – A survey

Theoretical Computer Science
Self-overlapping Occurrences and Knuth-Morris-Pratt Algorithm for Weighted Matching

LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
Algorithms for weighted matching

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Finding Significant Matches of Position Weight Matrices in Linear Time

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Parallel Position Weight Matrices algorithms

Parallel Computing
A simple pattern matching algorithm for weighted sequences

Proceedings of the 2012 ACM Research in Applied Computation Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho-Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho-Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.