Regular expression searching on compressed text

Authors:
Gonzalo Navarro
Affiliations:
Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile
Venue:
Journal of Discrete Algorithms
Year:
2003

Citing 23
Cited 9

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
From regular expressions to deterministic automata

Theoretical Computer Science
Algorithms

Algorithms
Text compression

Text compression
Average-case analysis of algorithms and data structures

Handbook of theoretical computer science (vol. A)
A Four Russians algorithm for regular expression pattern matching

Journal of the ACM (JACM)
Fast text searching: allowing errors

Communications of the ACM
Let sleeping files lie: pattern matching in Z-compressed files

Journal of Computer and System Sciences
Fast text searching for regular expressions or automaton searching on tries

Journal of the ACM (JACM)
A text compression scheme that allows fast searching directly in the compressed file

ACM Transactions on Information Systems (TOIS)
Fast and flexible word searching on compressed text

ACM Transactions on Information Systems (TOIS)
Programming Techniques: Regular expression search algorithm

Communications of the ACM
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Fast and flexible string matching by combining bit-parallelism and suffix automata

Journal of Experimental Algorithmics (JEA)
NR-grep: a fast and flexible pattern-matching tool

Software—Practice & Experience
A String Matching Algorithm Fast on the Average

Proceedings of the 6th Colloquium, on Automata, Languages and Programming
Optimal Two-Dimensional Compressed Matching

ICALP '94 Proceedings of the 21st International Colloquium on Automata, Languages and Programming
A New Regular Grammar Pattern Matching Algorithm

ESA '96 Proceedings of the Fourth Annual European Symposium on Algorithms
Regular Expression Searching over Ziv-Lempel Compressed Text

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Unifying Framework for Compressed Pattern Matching

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Bit-Parallel Approach to Approximate String Matching in Compressed Texts

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Multiple Pattern Matching in LZW Compressed Text

DCC '98 Proceedings of the Conference on Data Compression
Faster Approximate String Matching over Compressed Text

DCC '01 Proceedings of the Data Compression Conference

Definability and compression

Fundamenta Informaticae - Special issue on computing patterns in strings
LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text: Research Articles

Software—Practice & Experience
The SBC-tree: an index for run-length compressed sequences

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Manipulating lossless video in the compressed domain

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts

ACM Transactions on Algorithms (TALG)
Querying and embedding compressed texts

MFCS'06 Proceedings of the 31st international conference on Mathematical Foundations of Computer Science
Definability and Compression

Fundamenta Informaticae - Computing Patterns in Strings
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Processing compressed texts: a tractability border

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text in O(2m + mn + Rm log m) worst case time. On average this drops to O(m2 + (n + Rm) log m) or O(m2 + n + Ru/n) for most regular expressions. This is the first nontrivial result for this problem. The experimental results show that our compressed search algorithm needs half the time necessary for decompression plus searching, which is currently the only alternative.