Journal of the ACM (JACM)
Text algorithms
The zooming method: a recursive approach to time-space efficient string-matching
Theoretical Computer Science
String matching in Lempel-Ziv compressed strings
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Saving comparisons in the Crochemore-Perrin string-matching algorithm
Theoretical Computer Science
A text compression scheme that allows fast searching directly in the compressed file
ACM Transactions on Information Systems (TOIS)
Let sleeping files lie: pattern matching in Z-compressed files
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Inplace run-length 2d compressed search
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
An efficient machine-independent procedure for garbage collection in various list structures
Communications of the ACM
Time/Space Efficient Compressed Pattern Matching
FCT '01 Proceedings of the 13th International Symposium on Fundamentals of Computation Theory
Speeding Up Pattern Matching by Text Compression
CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Almost Optimal Fully LZW-Compressed Pattern Matching
DCC '99 Proceedings of the Conference on Data Compression
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
On the Complexity of Finite Sequences
IEEE Transactions on Information Theory
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Hi-index | 0.00 |
An exact pattern matching problem is to find all occurrences of a pattern p in a text t. We say that the pattern matching algorithm is optimal if its running time is linear in the sizes of t and p, i.e., O(t+p). Perhaps one of the most interesting settings of the pattern matching problem is when one has to design an efficient algorithm with a help of a small extra space. In this paper we explore this setting to the extreme. We work under an assumption that the text t is available only in a compressed form, represented by a straight-line program. The compression methods based on efficient construction of straight-line programs are as competitive as the compression standards, including the Lempel-Ziv compression scheme and recently intensively studied text compression via block sorting, due to Burrows and Wheeler. Our main result is an algorithm that solves the compressed string matching problem in an optimal linear time, with a help of a constant extra space. We also discuss an efficient implementation of a version our algorithm showing that the new concept may have also some interesting real applications. Our result is in contrast with many other compressed pattern matching algorithms where the goal is to find all pattern occurrences in time related to the size of the compressed text. However one must remember that all previous algorithms used at least a linear (in a compressed text, a dictionary, or a pattern) extra memory while our algorithm can be implemented in a constant size extra space. Also from the practical point of view, when the compression ratio is constant (very rarely smaller than 25%), there is no dramatic difference between the running time based on the size of the compressed text and the size of the original text, while an extra space resources might be strictly limited.