Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Compact pat trees
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Data compression via textual substitution
Journal of the ACM (JACM)
On the sorting-complexity of suffix tree construction
Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Collage system: a unifying framework for compressed pattern matching
Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
Some Theory and Practice of Greedy Off-Line Textual Substitution
DCC '98 Proceedings of the Conference on Data Compression
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Time/space efficient compressed pattern matching
Fundamenta Informaticae - Special issue on computing patterns in strings
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
The level ancestor problem simplified
Theoretical Computer Science - Latin American theorotical informatics
0(\sqrt {\log n)} Approximation to SPARSEST CUT in Õ(n2) Time
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Real-Time Traversal in Grammar-Based Compressed Files
DCC '05 Proceedings of the Data Compression Conference
Journal of the ACM (JACM)
Rank/select operations on large alphabets: a tool for text indexing
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Rank and select revisited and extended
Theoretical Computer Science
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
A compressed self-index using a Ziv---Lempel dictionary
Information Retrieval
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Succinct representations of permutations
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
LZ77-Like Compression with Fast Random Access
DCC '10 Proceedings of the 2010 Data Compression Conference
Compressed q-Gram Indexing for Highly Repetitive Biological Sequences
BIBE '10 Proceedings of the 2010 IEEE International Conference on Bioinformatics and Bioengineering
Fully-functional succinct trees
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Adaptive searching in succinctly encoded binary relations and tree-structured documents
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Reducing the space requirement of LZ-Index
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Optimal succinctness for range minimum queries
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Compact rich-functional binary relation representations
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Grammar-based codes: a new class of universal lossless source codes
IEEE Transactions on Information Theory
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Speeding up q-gram mining on grammar-based compressed texts
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Improved grammar-based compressed indexes
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Variable-Length codes for space-efficient grammar-based compression
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Fast q-gram mining on SLP compressed strings
Journal of Discrete Algorithms
ESP-index: A compressed index based on edit-sensitive parsing
Journal of Discrete Algorithms
On compressing and indexing repetitive sequences
Theoretical Computer Science
Fingerprints in compressed strings
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Compact binary relation representations with rich functionality
Information and Computation
Journal of Discrete Algorithms
Hi-index | 0.00 |
Self-indexes aim at representing text collections in a compressed format that allows extracting arbitrary portions and also offers indexed searching on the collection. Current self-indexes are unable of fully exploiting the redundancy of highly repetitive text collections that arise in several applications. Grammar-based compression is well suited to exploit such repetitiveness. We introduce the first grammar-based self-index. It builds on Straight-Line Programs (SLPs), a rather general kind of context-free grammars. If an SLP of n rules represents a text T[1, u], then an SLP-compressed representation of T requires 2n log 2 n bits. For that same SLP, our self-index takes O(n log n) + n log 2 u bits. It extracts any text substring of length m in time O((m + h) log n), and finds occ occurrences of a pattern string of length m in time O((m(m + h) + h occ) log n), where h is the height of the parse tree of the SLP. No previous grammar representation had achieved o(n) search time. As byproducts we introduce (i) a representation of SLPs that takes 2n log 2 n(1 + o(1)) bits and efficiently supports more operations than a plain array of rules; (ii) a representation for binary relations with labels supporting various extended queries; (iii) a generalization of our self-index to grammar compressors that reduce T to a sequence of terminals and nonterminals, such as Re-Pair and LZ78.