Practical compressed document retrieval

  • Authors:
  • Gonzalo Navarro;Simon J. Puglisi;Daniel Valenzuela

  • Affiliations:
  • Dept. of Computer Science, University of Chile;School of Computer Science and Information Technology, Royal Melbourne Institute of Technology;Dept. of Computer Science, University of Chile

  • Venue:
  • SEA'11 Proceedings of the 10th international conference on Experimental algorithms
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research on document retrieval for general texts has established the virtues of explicitly representing the so-called document array, which stores the document each pointer of the suffix array belongs to. While it makes document retrieval faster, this array occupies a significative amount of redundant space and is not easily compressible. In this paper we present the first practical proposal to compress the document array. We show that the resulting structure is significatively smaller than the uncompressed counterpart, and than alternatives to the document array proposed in the literature. We also compare various known algorithms for document listing and top-k retrieval, and find that the most useful combinations of algorithms run over our new compressed document arrays.