Interval-based pruning for top-k processing over compressed lists

Authors:
Kaushik Chakrabarti;Surajit Chaudhuri;Venkatesh Ganti
Affiliations:
Microsoft Research, USA;Microsoft Research, USA;Google Inc., USA
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 4

Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Optimized top-k processing with global page scores on block-max indexes

Proceedings of the fifth ACM international conference on Web search and data mining
A candidate filtering mechanism for fast top-k query processing on modern cpus

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Efficient parallel block-max WAND algorithm

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Optimizing execution of top-k queries over record-id ordered, compressed lists is challenging. The threshold family of algorithms cannot be effectively used in such cases. Yet, improving execution of such queries is of great value. For example, top-k keyword search in information retrieval (IR) engines represents an important scenario where such optimization can be directly beneficial. In this paper, we develop novel algorithms to improve execution of such queries over state of the art techniques. Our main insights are pruning based on fine-granularity bounds and traversing the lists based on judiciously chosen "intervals" rather than individual records. We formally study the optimality characteristics of the proposed algorithms. Our algorithms require minimal changes and can be easily integrated into IR engines. Our experiments on real-life datasets show that our algorithm outperform the state of the art techniques by a factor of 3 -- 6 in terms of query execution times.