Top-k document retrieval in optimal time and linear space

  • Authors:
  • Gonzalo Navarro;Yakov Nekrich

  • Affiliations:
  • University of Chile;University of Chile

  • Venue:
  • Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a data structure that uses O(n)-word space and reports k most relevant documents that contain a query pattern P in optimal O(|P| + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between two occurrences of P in a document. We show how to reduce the space of the data structure from O(n log n) to O(n (log σ + log D + log log n)) bits, where σ is the alphabet size and D is the total number of documents.