A faster grammar-based self-index

Authors:
Travis Gagie;Paweł Gawrychowski;Juha Kärkkäinen;Yakov Nekrich
Affiliations:
Aalto University, Finland;University of Wrocław, Poland;University of Helsinki, Finland;University of Bonn, Germany
Venue:
LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Year:
2012

Citing 18
Cited 4

PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Scaling and related techniques for geometry problems

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
Indexing compressed text

Journal of the ACM (JACM)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Preserving order in a forest in less than logarithmic time

SFCS '75 Proceedings of the 16th Annual Symposium on Foundations of Computer Science
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing

DCC '08 Proceedings of the Data Compression Conference
Self-indexed Text Compression Using Straight-Line Programs

MFCS '09 Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science 2009
On Entropy-Compressed Text Indexing in External Memory

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Wee LCP

Information Processing Letters
Indexing similar DNA sequences

AAIM'10 Proceedings of the 6th international conference on Algorithmic aspects in information and management
Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Note: Combined data structure for previous- and next-smaller-values

Theoretical Computer Science
Orthogonal range searching on the RAM, revisited

Proceedings of the twenty-seventh annual symposium on Computational geometry
Self-indexing based on LZ77

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
ESP-index: a compressed index based on edit-sensitive parsing

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
The smallest grammar problem

IEEE Transactions on Information Theory

Fast relative lempel-ziv self-index for similar sequences

FAW-AAIM'12 Proceedings of the 6th international Frontiers in Algorithmics, and Proceedings of the 8th international conference on Algorithmic Aspects in Information and Management
Improved grammar-based compressed indexes

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
ESP-index: A compressed index based on edit-sensitive parsing

Journal of Discrete Algorithms
RCSI: scalable similarity search in thousand(s) of genomes

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on straight-line programs and LZ77. In this paper we show how, given a balanced straight-line program for a string S[1..n] whose LZ77 parse consists of z phrases, we can add O(z log log z) words and obtain a compressed self-index for S such that, given a pattern P [1..m], we can list the occ occurrences of P in S in O(m2 + (m + occ) log log n) time. All previous self-indexes are either larger or slower in the worst case.