Time/space efficient compressed pattern matching

Authors:
Leszek Gasieniec;Igor Potapov
Affiliations:
Department of Computer Science, University of Liverpool, Liverpool L69 7ZF, UK;Department of Computer Science, University of Liverpool, Liverpool L69 7ZF, UK
Venue:
Fundamenta Informaticae - Special issue on computing patterns in strings
Year:
2002

Citing 12
Cited 2

Two-way string-matching

Journal of the ACM (JACM)
Text algorithms

Text algorithms
The zooming method: a recursive approach to time-space efficient string-matching

Theoretical Computer Science
String matching in Lempel-Ziv compressed strings

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Saving comparisons in the Crochemore-Perrin string-matching algorithm

Theoretical Computer Science
A text compression scheme that allows fast searching directly in the compressed file

ACM Transactions on Information Systems (TOIS)
Let sleeping files lie: pattern matching in Z-compressed files

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Inplace run-length 2d compressed search

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
An efficient machine-independent procedure for garbage collection in various list structures

Communications of the ACM
Almost Optimal Fully LZW-Compressed Pattern Matching

DCC '99 Proceedings of the Conference on Data Compression
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research

Random access to grammar-compressed strings

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Self-Indexed Grammar-Based Compression

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

An exact pattern matching problem is to find all occurrences of a pattern p in a text t. We say that the pattern matching algorithm is optimal if its running time is linear in the sizes of t and p, i.e., O(t + p). Perhaps one of the most interesting settings of the pattern matching problem is when one has to design an efficient algorithm with a help of a small extra space. In this paper we explore this setting to the extreme. We work under an assumption that the text t is available only in a compressed form, represented by a straight-line program. The compression methods based on efficient construction of straight-line programs are as competitive as the compression standards, including the Lempel-Ziv compression scheme and recently intensively studied text compression via block sorting, due to Burrows and Wheeler. Our main result is an algorithm that solves the compressed string matching problem in an optimal linear time, with a help of a constant extra space. We also discuss an efficient implementation of a version our algorithm showing that the new concept may have also some interesting real applications. Our result is in contrast with many other compressed pattern matching algorithms where the goal is to find all pattern occurrences in time related to the size of the compressed text. However one must remember that all previous algorithms used at least a linear (in a compressed text, a dictionary, or a pattern) extra memory while our algorithm can be implemented in a constant size extra space. Also from the practical point of view, when the compression ratio is constant (very rarely smaller than 25%), there is no dramatic difference between the running time based on the size of the compressed text and the size of the original text, while an extra space resources might be strictly limited.