Simple and efficient LZW-Compressed multiple pattern matching

Authors:
Paweł Gawrychowski
Affiliations:
Institute of Computer Science, University of Wrocław, Poland,Max-Planck-Institute für Informatik, Saarbrücken, Germany
Venue:
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Year:
2012

Citing 13
Cited 1

Making data structures persistent

Journal of Computer and System Sciences - 18th Annual ACM Symposium on Theory of Computing (STOC), May 28-30, 1986
String matching in Lempel-Ziv compressed strings

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Let sleeping files lie: pattern matching in Z-compressed files

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
String Matching in Real Time

Journal of the ACM (JACM)
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Pattern Matching in Compressed Texts

Proceedings of the 15th Conference on Foundations of Software Technology and Theoretical Computer Science
Time-space-optimal string matching (Preliminary Report)

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Multiple Pattern Matching in LZW Compressed Text

DCC '98 Proceedings of the Conference on Data Compression
The level ancestor problem simplified

Theoretical Computer Science - Latin American theorotical informatics
A Technique for High-Performance Data Compression

Computer
Optimal pattern matching in LZW compressed strings

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms

Faster fully compressed pattern matching by recompression

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a natural variant of the classical multiple pattern matching problem: given a Lempel-Ziv-Welch representation of a string $t[1\mathinner{\ldotp\ldotp} N]$ and a collection of (uncompressed) patterns p1,p2,…,pℓ with ∑i|pi|=M, does any of pi occur in t? As shown by Kida et al. [12], extending the single pattern algorithm of Amir, Benson and Farach [2] gives a running time of $\mathcal{O}(n+M^{2})$ for the more general case. We prove that in fact it is possible to achieve $\mathcal{O}(n\log M+M)$ or $\mathcal{O}(n+M^{1+\epsilon})$ complexity. While not linear, running time of our solution matches the single pattern bounds achieved by [2] and [14] in a more structured and unified manner, and without using a lot of combinatorics on words. The only nontrivial components are the suffix array, constant time range minimum queries, and any balanced binary search trees.