Simple and efficient LZW-compressed multiple pattern matching

Authors:
Paweł Gawrychowski
Affiliations:
Institute of Computer Science, University of Wrocław, Wrocław, Poland and Max-Planck-Institut für Informatik, Saarbrücken, Germany
Venue:
Journal of Discrete Algorithms
Year:
2014

Citing 15
Cited 0

Making data structures persistent

Journal of Computer and System Sciences - 18th Annual ACM Symposium on Theory of Computing (STOC), May 28-30, 1986
String matching in Lempel-Ziv compressed strings

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
An efficient algorithm for dynamic text indexing

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Let sleeping files lie: pattern matching in Z-compressed files

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
String Matching in Real Time

Journal of the ACM (JACM)
Efficient string matching: an aid to bibliographic search

Communications of the ACM
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Pattern Matching in Compressed Texts

Proceedings of the 15th Conference on Foundations of Software Technology and Theoretical Computer Science
Time-space-optimal string matching (Preliminary Report)

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Multiple Pattern Matching in LZW Compressed Text

DCC '98 Proceedings of the Conference on Data Compression
Linear work suffix array construction

Journal of the ACM (JACM)
A Technique for High-Performance Data Compression

Computer
Substring range reporting

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Stronger Lempel-Ziv Based Compressed Text Indexing

Algorithmica
Optimal Pattern Matching in LZW Compressed Strings

ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a natural variant of the classical multiple pattern matching problem: given a Lempel-Ziv-Welch representation of a string and a collection of (uncompressed) patterns, does any of them occur in the text? As shown by Kida et al. [15], extending the single pattern algorithm of Amir, Benson and Farach [2] gives a running time of O(n+M^2) for the more general case, where n is the number of codewords in the compressed representation of the text and M is the sum of the length of all patterns. We prove that in fact it is possible to achieve O(nlogM+M) or O(n+M^1^+^@e) complexity. While not linear, running times of our solutions match the single pattern bounds achieved by the previously known solutions [2,17] in a more structured and unified manner, and without using any combinatorics on words. The only nontrivial components of our method are suffix arrays, constant time range minimum queries, and balanced binary search trees.