Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
From regular expressions to deterministic automata
Theoretical Computer Science
Algorithms
Text compression
Average-case analysis of algorithms and data structures
Handbook of theoretical computer science (vol. A)
A Four Russians algorithm for regular expression pattern matching
Journal of the ACM (JACM)
Fast text searching: allowing errors
Communications of the ACM
Let sleeping files lie: pattern matching in Z-compressed files
Journal of Computer and System Sciences
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
A text compression scheme that allows fast searching directly in the compressed file
ACM Transactions on Information Systems (TOIS)
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Programming Techniques: Regular expression search algorithm
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Fast and flexible string matching by combining bit-parallelism and suffix automata
Journal of Experimental Algorithmics (JEA)
NR-grep: a fast and flexible pattern-matching tool
Software—Practice & Experience
A String Matching Algorithm Fast on the Average
Proceedings of the 6th Colloquium, on Automata, Languages and Programming
Optimal Two-Dimensional Compressed Matching
ICALP '94 Proceedings of the 21st International Colloquium on Automata, Languages and Programming
A New Regular Grammar Pattern Matching Algorithm
ESA '96 Proceedings of the Fourth Annual European Symposium on Algorithms
Regular Expression Searching over Ziv-Lempel Compressed Text
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Unifying Framework for Compressed Pattern Matching
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Bit-Parallel Approach to Approximate String Matching in Compressed Texts
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Multiple Pattern Matching in LZW Compressed Text
DCC '98 Proceedings of the Conference on Data Compression
Faster Approximate String Matching over Compressed Text
DCC '01 Proceedings of the Data Compression Conference
Fundamenta Informaticae - Special issue on computing patterns in strings
LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text: Research Articles
Software—Practice & Experience
The SBC-tree: an index for run-length compressed sequences
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Manipulating lossless video in the compressed domain
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts
ACM Transactions on Algorithms (TALG)
Querying and embedding compressed texts
MFCS'06 Proceedings of the 31st international conference on Mathematical Foundations of Computer Science
Fundamenta Informaticae - Computing Patterns in Strings
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Processing compressed texts: a tractability border
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text in O(2m + mn + Rm log m) worst case time. On average this drops to O(m2 + (n + Rm) log m) or O(m2 + n + Ru/n) for most regular expressions. This is the first nontrivial result for this problem. The experimental results show that our compressed search algorithm needs half the time necessary for decompression plus searching, which is currently the only alternative.