PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Approximating the smallest grammar: Kolmogorov complexity in natural models
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Collage system: a unifying framework for compressed pattern matching
Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
Real-Time Traversal in Grammar-Based Compressed Files
DCC '05 Proceedings of the Data Compression Conference
Journal of the ACM (JACM)
Representing Trees of Higher Degree
Algorithmica
ACM Computing Surveys (CSUR)
A compressed self-index using a Ziv---Lempel dictionary
Information Retrieval
On the Redundancy of Succinct Data Structures
SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Storage and Retrieval of Individual Genomes
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Succinct representations of permutations
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Compressed q-Gram Indexing for Highly Repetitive Biological Sequences
BIBE '10 Proceedings of the 2010 IEEE International Conference on Bioinformatics and Bioengineering
Indexing similar DNA sequences
AAIM'10 Proceedings of the 6th international conference on Algorithmic aspects in information and management
Orthogonal range searching on the RAM, revisited
Proceedings of the twenty-seventh annual symposium on Computational geometry
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
ESP-index: a compressed index based on edit-sensitive parsing
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Indexes for highly repetitive document collections
Proceedings of the 20th ACM international conference on Information and knowledge management
Reducing the space requirement of LZ-Index
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Random access to grammar-compressed strings
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
A faster grammar-based self-index
LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Grammar-based codes: a new class of universal lossless source codes
IEEE Transactions on Information Theory
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Fast relative lempel-ziv self-index for similar sequences
FAW-AAIM'12 Proceedings of the 6th international Frontiers in Algorithmics, and Proceedings of the 8th international conference on Algorithmic Aspects in Information and Management
Self-Indexed Grammar-Based Compression
Fundamenta Informaticae
Hi-index | 0.00 |
We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T[1..u] that is represented by a (context-free) grammar of n (terminal and nonterminal) symbols and size N (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of T takes $N\lg n$ bits of space. Our representation requires $2N\lg n + N\lg u + \epsilon\, n\lg n + o(N\lg n)$ bits of space, for any 0ε≤1. It can find the positions of the occ occurrences of a pattern of length m in T in $O\left((m^2/\epsilon)\lg \left(\frac{\lg u}{\lg n}\right) + (m+occ)\lg n\right)$ time, and extract any substring of length ℓ of T in time $O(\ell+h\lg(N/h))$, where h is the height of the grammar tree.