Towards an optimal space-and-query-time index for top-k document retrieval

  • Authors:
  • Wing-Kai Hon;Rahul Shah;Sharma V. Thankachan

  • Affiliations:
  • Department of CS, National Tsing Hua University, Taiwan;Department of CS, Louisiana State University;Department of CS, Louisiana State University

  • Venue:
  • CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Let $\cal{D} = $ {d1,d2,...dD} be a given set of D string documents of total length n, our task is to index $\cal{D}$, such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. We propose an index of size |CSA|+nlogD(2+o(1)) bits and O(ts(p)+kloglogn+polyloglogn) query time for the basic relevance metric term-frequency, where |CSA| is the size (in bits) of a compressed full text index of $\cal{D}$, with O(ts(p)) time for searching a pattern of length p. We further reduce the space to |CSA|+nlogD(1+o(1)) bits, however the query time will be O(ts(p)+k(logσloglogn)1+ε+polyloglogn), where σ is the alphabet size and ε0 is any constant.