Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Text algorithms
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Accelerating Protein Classification Using Suffix Trees
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Large scale matching for position weight matrices
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Fast profile matching algorithms – A survey
Theoretical Computer Science
Self-overlapping Occurrences and Knuth-Morris-Pratt Algorithm for Weighted Matching
LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
Algorithms for weighted matching
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Finding Significant Matches of Position Weight Matrices in Linear Time
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Parallel Position Weight Matrices algorithms
Parallel Computing
A simple pattern matching algorithm for weighted sequences
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Hi-index | 0.00 |
Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho-Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho-Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.