A Four Russians algorithm for regular expression pattern matching
Journal of the ACM (JACM)
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Programming Techniques: Regular expression search algorithm
Communications of the ACM
Approximate string matching with gaps
Nordic Journal of Computing
Nested Counters in Bit-Parallel String Matching
LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
Faster Regular Expression Matching
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Regular expression matching with multi-strings and intervals
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online dictionary matching with variable-length gaps
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Finding patterns with variable length gaps or don’t cares
COCOON'06 Proceedings of the 12th annual international conference on Computing and Combinatorics
New algorithms for regular expression matching
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Computer Networks: The International Journal of Computer and Telecommunications Networking
Hi-index | 5.23 |
We consider string matching with variable length gaps. Given a string T and a pattern P consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in T that match P. This problem is a basic primitive in computational biology applications. Let m and n be the lengths of P and T, respectively, and let k be the number of strings in P. We present a new algorithm achieving time O(nlogk+m+@a) and space O(m+A), where A is the sum of the lower bounds of the lengths of the gaps in P and @a is the total number of occurrences of the strings in P within T. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of m, n, k, A, and @a. Our algorithm is surprisingly simple and straightforward to implement. We also present algorithms for finding and encoding the positions of all strings in P for every match of the pattern.