Top-k aggregation using intersections of ranked inputs

Authors:
Ravi Kumar;Kunal Punera;Torsten Suel;Sergei Vassilvitskii
Affiliations:
Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA
Venue:
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Year:
2009

Citing 22
Cited 12

Implementations of partial document ranking using inverted files

Information Processing and Management: an International Journal
Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Compressed inverted files with reduced decoding overheads

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Combining fuzzy information: an overview

ACM SIGMOD Record
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Query Processing Issues in Image(Multimedia) Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast phrase querying with combined indexes

ACM Transactions on Information Systems (TOIS)
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Answering top-k queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Introduction to Information Retrieval

Introduction to Information Retrieval
Efficient text proximity search

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval

Revisiting globally sorted indexes for efficient document retrieval

Proceedings of the third ACM international conference on Web search and data mining
Early exit optimizations for additive machine learned ranking systems

Proceedings of the third ACM international conference on Web search and data mining
Consistent query answers in inconsistent probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Query forwarding in geographically distributed search engines

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Batch query processing for web search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Efficiently encoding term co-occurrences in inverted indexes

Proceedings of the 20th ACM international conference on Information and knowledge management
High-performance processing of text queries with tunable pruned term and term pair indexes

ACM Transactions on Information Systems (TOIS)
Efficient monitoring of personalized hot news over Web 2.0 streams

Computer Science - Research and Development
Efficient top-k document retrieval using a term-document binary matrix

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Optimizing index for taxonomy keyword search

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Context-aware top-K processing using views

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and preaggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches. We generalize the TA and NRA algorithms to the case when preaggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain "instance optimal," a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.