Improved compressed indexes for full-text document retrieval

  • Authors:
  • Djamal Belazzougui;Gonzalo Navarro

  • Affiliations:
  • LIAFA, Univ. Paris Diderot - Paris 7, France;Department of Computer Science, University of Chile

  • Venue:
  • SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least |CSA| + O(n lgD/lg lgD) or 2|CSA| + o(n) bits of space, where CSA is a full-text index. Using monotone minimum perfect hash functions, we give new algorithms for document listing with frequencies and top-k document retrieval using just |CSA| + O(n lg lg lgD) bits. We also improve current solutions that use 2|CSA| + o(n) bits, and consider other problems such as colored range listing, top-k most important documents, and computing arbitrary frequencies.