Dynamic entropy-compressed sequences and full-text indexes

  • Authors:
  • Veli Mäkinen;Gonzalo Navarro

  • Affiliations:
  • Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of Chile

  • Venue:
  • CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a sequence of n bits with binary zero-order entropy H0, we present a dynamic data structure that requires nH0 + o(n) bits of space, which is able of performing rank and select, as well as inserting and deleting bits at arbitrary positions, in O(logn) worst-case time. This extends previous results by Hon et al. [ISAAC 2003] achieving O(logn/loglogn) time for rank and select but $\Theta({\textrm{polylog}}(n))$ amortized time for inserting and deleting bits, and requiring n + o(n) bits of space; and by Raman et al. [SODA 2002] which have constant query time but a static structure. In particular, our result becomes the first entropy-bound dynamic data structure for rank and select over bit sequences. We then show how the above result can be used to build a dynamic full-text self-index for a collection of texts over an alphabet of size σ, of overall length n and zero-order entropy H0. The index requires nH0 + o(n logσ) bits of space, and can count the number of occurrences of a pattern of length m in time O(m logn logσ). Reporting the occ occurrences can be supported in O(occ log2n logσ) time, paying O(n) extra space. Insertion of text to the collection takes O(logn logσ) time per symbol, which becomes O(log2n logσ) for deletions. This improves a previous result by Chan et al. [CPM 2004]. As a consequence, we obtain an O(n logn logσ) time construction algorithm for a compressed self-index requiring nH0 + o(n logσ) bits working space during construction.