Top-k aggregation using intersections of ranked inputs

  • Authors:
  • Ravi Kumar;Kunal Punera;Torsten Suel;Sergei Vassilvitskii

  • Affiliations:
  • Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA

  • Venue:
  • Proceedings of the Second ACM International Conference on Web Search and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and preaggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches. We generalize the TA and NRA algorithms to the case when preaggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain "instance optimal," a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.