Optimizing top-k document retrieval strategies for block-max indexes

  • Authors:
  • Constantinos Dimopoulos;Sergey Nepomnyachiy;Torsten Suel

  • Affiliations:
  • Polytechnic Institute of NYU, Brooklyn, NY, USA;Polytechnic Institute of NYU, Brooklyn, NY, USA;Polytechnic Institute of NYU, Brooklyn, NY, USA

  • Venue:
  • Proceedings of the sixth ACM international conference on Web search and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing efficiency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9,7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9,7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.