Random access to grammar-compressed strings

Authors:
Philip Bille;Gad M. Landau;Rajeev Raman;Kunihiko Sadakane;Srinivasa Rao Satti;Oren Weimann
Affiliations:
Technical University of Denmark, Denmark;University of Haifa, Israel;University of Leicester, UK;National Institute of Informatics, Japan;Seoul National University, S. Korea;Weizmann Institute of Science, Israel
Venue:
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Year:
2011

Citing 45
Cited 13

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
Data structures and network algorithms

Data structures and network algorithms
Fast parallel and serial approximate string matching

Journal of Algorithms
Computing partial sums in multidimensional arrays

SCG '89 Proceedings of the fifth annual symposium on Computational geometry
Surpassing the information theoretic bound with fusion trees

Journal of Computer and System Sciences - Special issue: papers from the 22nd ACM symposium on the theory of computing, May 14–16, 1990
A new algorithm for data compression

The C Users Journal
Trans-dichotomous algorithms for minimum spanning trees and shortest paths

Journal of Computer and System Sciences - Special issue: 31st IEEE conference on foundations of computer science, Oct. 22–24, 1990
An improved algorithm for computing the edit distance of run-length coded strings

Information Processing Letters
Let sleeping files lie: pattern matching in Z-compressed files

Journal of Computer and System Sciences
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Succinct Representation of Balanced Parentheses and Static Trees

SIAM Journal on Computing
Approximate String Matching: A Simpler Faster Algorithm

SIAM Journal on Computing
Edit distance of run-length encoded strings

Information Processing Letters
Inplace 2D matching in compressed images

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Sorting and Searching on the Word RAM

STACS '98 Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science
Approximate String Matching over Ziv-Lempel Compressed Text

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Approximate Matching of Run-Length Compressed Strings

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Text Compression Scheme That Allows Fast Searching Directly in the Compressed File

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Perfect Hashing for Strings: Formalization and Algorithms

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Speeding Up Pattern Matching by Text Compression

CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Offline Dictionary-Based Compression

DCC '99 Proceedings of the Conference on Data Compression
Compression of Biological Sequences by Greedy Off-Line Textual Substitution

DCC '00 Proceedings of the Conference on Data Compression
Marked Ancestor Problems

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
Some Theory and Practice of Greedy Off-Line Textual Substitution

DCC '98 Proceedings of the Conference on Data Compression
Faster Approximate String Matching over Compressed Text

DCC '01 Proceedings of the Data Compression Conference
A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices

SIAM Journal on Computing
Time/space efficient compressed pattern matching

Fundamenta Informaticae - Special issue on computing patterns in strings
Faster algorithms for string matching with k mismatches

Journal of Algorithms - Special issue: SODA 2000
Real-Time Traversal in Grammar-Based Compressed Files

DCC '05 Proceedings of the Data Compression Conference
Vectorizing and Querying Large XML Repositories

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A Technique for High-Performance Data Compression

Computer
Engineering succinct DOM

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Efficient memory representation of XML document trees

Information Systems
Compressing and indexing labeled trees, with applications

Journal of the ACM (JACM)
Self-indexed Text Compression Using Straight-Line Programs

MFCS '09 Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science 2009
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts

ACM Transactions on Algorithms (TALG)
Fully-functional succinct trees

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Window subsequence problems for compressed texts

CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Grammar-based codes: a new class of universal lossless source codes

IEEE Transactions on Information Theory
Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. I. Without context models

IEEE Transactions on Information Theory
Universal lossless compression via multilevel pattern matching

IEEE Transactions on Information Theory
The smallest grammar problem

IEEE Transactions on Information Theory
Processing compressed texts: a tractability border

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Faster approximate pattern matching in compressed repetitive texts

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Computing q-gram non-overlapping frequencies on SLP compressed texts

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Fast relative lempel-ziv self-index for similar sequences

FAW-AAIM'12 Proceedings of the 6th international Frontiers in Algorithmics, and Proceedings of the 8th international conference on Algorithmic Aspects in Information and Management
Algorithms and limits for compact plan representations

Journal of Artificial Intelligence Research
Efficient LZ78 factorization of grammar compressed text

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Improved grammar-based compressed indexes

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
ESP-index: A compressed index based on edit-sensitive parsing

Journal of Discrete Algorithms
XML compression via DAGs

Proceedings of the 16th International Conference on Database Theory
On compressing and indexing repetitive sequences

Theoretical Computer Science
Colored range queries and document retrieval

Theoretical Computer Science
Tree compression with top trees

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Compressed automata for dictionary matching

CIAA'13 Proceedings of the 18th international conference on Implementation and Application of Automata
Fingerprints in compressed strings

WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let S be a string of length N compressed into a context-free grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is the inverse of the kth row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{|P|k, k4 +|P|} +log N) + occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammar-compressed trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars.