Improved grammar-based compressed indexes

  • Authors:
  • Francisco Claude;Gonzalo Navarro

  • Affiliations:
  • David R. Cheriton School of Computer Science, University of Waterloo, Canada;Department of Computer Science, University of Chile, Chile

  • Venue:
  • SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T[1..u] that is represented by a (context-free) grammar of n (terminal and nonterminal) symbols and size N (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of T takes $N\lg n$ bits of space. Our representation requires $2N\lg n + N\lg u + \epsilon\, n\lg n + o(N\lg n)$ bits of space, for any 0ε≤1. It can find the positions of the occ occurrences of a pattern of length m in T in $O\left((m^2/\epsilon)\lg \left(\frac{\lg u}{\lg n}\right) + (m+occ)\lg n\right)$ time, and extract any substring of length ℓ of T in time $O(\ell+h\lg(N/h))$, where h is the height of the grammar tree.