Fast Searching in Packed Strings

Authors:
Philip Bille
Affiliations:
Technical University of Denmark,
Venue:
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Year:
2009

Citing 16
Cited 3

Improved string searching

Software—Practice & Experience
A Four Russians algorithm for regular expression pattern matching

Journal of the ACM (JACM)
A new approach to text searching

Communications of the ACM
Two-dimensional periodicity and its applications

SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
A subquadratic algorithm for approximate regular expression matching

Journal of Algorithms
Let sleeping files lie: pattern matching in Z-compressed files

Journal of Computer and System Sciences
String matching in the DNA alphabet

Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A fast string searching algorithm

Communications of the ACM
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Faster String Matching with Super-Alphabets

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Algorithms on Compressed Strings and Arrays

SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Shift-or string matching with super-alphabets

Information Processing Letters
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing
A Technique for High-Performance Data Compression

Computer
Accelerating Boyer Moore searches on binary texts

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata

Worst case efficient single and multiple string matching in the RAM model

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
From nondeterministic suffix automaton to lazy suffix tree

Algorithms and Applications
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given strings P and Q the (exact) string matching problem is to find all positions of substrings in Q matching P . The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let m ≤ n be the lengths P and Q , respectively, and let *** denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time $$O\left(\frac{n}{\log_\sigma n} + m + {\mathrm{occ}}\right).$$ Here occ is the number of occurrences of P in Q . For m = o (n ) this improves the O (n ) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m = O (n /log *** n ) our algorithm is optimal since any algorithm must spend at least $\Omega(\frac{(n+m)\log \sigma}{\log n} + {\mathrm{occ}}) = \Omega(\frac{n}{\log_\sigma n} + {\mathrm{occ}})$ time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.