Software—Practice & Experience
A Four Russians algorithm for regular expression pattern matching
Journal of the ACM (JACM)
A new approach to text searching
Communications of the ACM
Two-dimensional periodicity and its applications
SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
A subquadratic algorithm for approximate regular expression matching
Journal of Algorithms
Let sleeping files lie: pattern matching in Z-compressed files
Journal of Computer and System Sciences
String matching in the DNA alphabet
Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A fast string searching algorithm
Communications of the ACM
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Faster String Matching with Super-Alphabets
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Algorithms on Compressed Strings and Arrays
SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Shift-or string matching with super-alphabets
Information Processing Letters
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Accelerating Boyer Moore searches on binary texts
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Worst case efficient single and multiple string matching in the RAM model
IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
From nondeterministic suffix automaton to lazy suffix tree
Algorithms and Applications
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model
Journal of Discrete Algorithms
Hi-index | 0.00 |
Given strings P and Q the (exact) string matching problem is to find all positions of substrings in Q matching P . The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let m ≤ n be the lengths P and Q , respectively, and let *** denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time $$O\left(\frac{n}{\log_\sigma n} + m + {\mathrm{occ}}\right).$$ Here occ is the number of occurrences of P in Q . For m = o (n ) this improves the O (n ) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m = O (n /log *** n ) our algorithm is optimal since any algorithm must spend at least $\Omega(\frac{(n+m)\log \sigma}{\log n} + {\mathrm{occ}}) = \Omega(\frac{n}{\log_\sigma n} + {\mathrm{occ}})$ time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.