Making data structures persistent
Journal of Computer and System Sciences - 18th Annual ACM Symposium on Theory of Computing (STOC), May 28-30, 1986
String matching in Lempel-Ziv compressed strings
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
An efficient algorithm for dynamic text indexing
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Let sleeping files lie: pattern matching in Z-compressed files
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Journal of the ACM (JACM)
Efficient string matching: an aid to bibliographic search
Communications of the ACM
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Pattern Matching in Compressed Texts
Proceedings of the 15th Conference on Foundations of Software Technology and Theoretical Computer Science
Time-space-optimal string matching (Preliminary Report)
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Multiple Pattern Matching in LZW Compressed Text
DCC '98 Proceedings of the Conference on Data Compression
Linear work suffix array construction
Journal of the ACM (JACM)
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Stronger Lempel-Ziv Based Compressed Text Indexing
Algorithmica
Optimal Pattern Matching in LZW Compressed Strings
ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11
Hi-index | 0.00 |
We consider a natural variant of the classical multiple pattern matching problem: given a Lempel-Ziv-Welch representation of a string and a collection of (uncompressed) patterns, does any of them occur in the text? As shown by Kida et al. [15], extending the single pattern algorithm of Amir, Benson and Farach [2] gives a running time of O(n+M^2) for the more general case, where n is the number of codewords in the compressed representation of the text and M is the sum of the length of all patterns. We prove that in fact it is possible to achieve O(nlogM+M) or O(n+M^1^+^@e) complexity. While not linear, running times of our solutions match the single pattern bounds achieved by the previously known solutions [2,17] in a more structured and unified manner, and without using any combinatorics on words. The only nontrivial components of our method are suffix arrays, constant time range minimum queries, and balanced binary search trees.