Optimized top-k processing with global page scores on block-max indexes

Authors:
Dongdong Shan;Shuai Ding;Jing He;Hongfei Yan;Xiaoming Li
Affiliations:
Peking University, Beijing, China;Polytechnic Institute of NYU , New York, USA;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China
Venue:
Proceedings of the fifth ACM international conference on Web search and data mining
Year:
2012

Citing 32
Cited 4

Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted file compression through document identifier reassignment

Information Processing and Management: an International Journal
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Optimization strategies for complex queries

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A document-centric approach to static index pruning in text retrieval systems

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient document retrieval in main memory

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
FRank: a ranking method with fidelity loss

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Effective top-k computation in retrieving structured documents with term-proximity support

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Can phrase indexing help to process non-phrase queries?

Proceedings of the 17th ACM conference on Information and knowledge management
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Revisiting globally sorted indexes for efficient document retrieval

Proceedings of the third ACM international conference on Web search and data mining
Early exit optimizations for additive machine learned ranking systems

Proceedings of the third ACM international conference on Web search and data mining
Sorting out the document identifier assignment problem

ECIR'07 Proceedings of the 29th European conference on IR research
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Efficient compressed inverted index skipping for disjunctive text-queries

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Interval-based pruning for top-k processing over compressed lists

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
A cascade ranking model for efficient ranked retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Effect of different docid orderings on dynamic pruning retrieval strategies

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Structured index organizations for high-throughput text querying

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Optimizing top-k document retrieval strategies for block-max indexes

Proceedings of the sixth ACM international conference on Web search and data mining
Panorama: a semantic-aware application search framework

Proceedings of the 16th International Conference on Extending Database Technology
Fast document-at-a-time query processing using two-tier indexes

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A candidate filtering mechanism for fast top-k query processing on modern cpus

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large web search engines are facing formidable performance challenges because they have to process thousands of queries per second on tens of billions of documents, within interactive response time. Among many others, Top-k query processing (also called early termination or dynamic pruning) is an important class of optimization techniques that can improve the search efficiency and achieve faster query processing by avoiding the scoring of documents that are unlikely to be in the top results. One recent technique is using Block-Max index. In the Block-Max index, the posting lists are organized as blocks and the maximum score for each block is stored to improve the query efficiency. Although query processing speedup is achieved with Block-Max index, the ranking function for the Top-k results is the term-based approach. It is well known that documents' static scores are also important for a good ranking function. In this paper, we show that the performance of the state-of-the-art algorithms with the Block-Max index is degraded when the static score is added in the ranking function. Then we study efficient techniques for Top-k query processing in the case where a page's static score is given, such as PageRank, in addition to the term-based approach. In particular, we propose a set of new algorithms based on the WAND and MaxScore with Block-Max index using local score, which outperform the existing ones. Then we propose new techniques to estimate a better score upper bound for each block. We also study the search efficiency on different index structures where the document identifiers are assigned by URL sorting or by static document scores. Experiments on TREC GOV2 and ClueWeb09B show that considerable performance gains are achieved.