A faster grammar-based self-index

  • Authors:
  • Travis Gagie;Paweł Gawrychowski;Juha Kärkkäinen;Yakov Nekrich

  • Affiliations:
  • Aalto University, Finland;University of Wrocław, Poland;University of Helsinki, Finland;University of Bonn, Germany

  • Venue:
  • LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on straight-line programs and LZ77. In this paper we show how, given a balanced straight-line program for a string S[1..n] whose LZ77 parse consists of z phrases, we can add O(z log log z) words and obtain a compressed self-index for S such that, given a pattern P [1..m], we can list the occ occurrences of P in S in O(m2 + (m + occ) log log n) time. All previous self-indexes are either larger or slower in the worst case.