Upper-bound approximations for dynamic pruning

Authors:
Craig Macdonald;Iadh Ounis;Nicola Tonellotto
Affiliations:
University of Glasgow, Scotland, UK;University of Glasgow, Scotland, UK;Information Science and Technologies Institute, National Research Council of Italy (ISTI-CNR), Pisa, Italy
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2011

Citing 28
Cited 5

Document filtering for fast ranking

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Query type classification for web document retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Sampling search-engine results

WWW '05 Proceedings of the 14th international conference on World Wide Web
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Optimisation methods for ranking functions with multiple parameters

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Effective top-k computation in retrieving structured documents with term-proximity support

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Locality-Based pruning methods for web search

ACM Transactions on Information Systems (TOIS)
Query dependent ranking using K-nearest neighbor

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
ResIn: a combination of results caching and index pruning for high-performance web search engines
Can phrase indexing help to process non-phrase queries?

Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the 2009 workshop on Web Search Click Data

Second ACM International Conference on Web Search and Web Data Mining
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Two-stage query segmentation for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
To divide and conquer search ranking by learning query difficulty

Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting query views for static index pruning in web search engines

Proceedings of the 18th ACM conference on Information and knowledge management
Incorporating robustness into web ranking evaluation

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient text proximity search

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Context-aware ranking in web search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The power of peers

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Frequentist and bayesian approach to information retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

On upper bounds for dynamic pruning

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Learning to predict response times for online query scheduling

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Efficient and effective retrieval using selective pruning

Proceedings of the sixth ACM international conference on Web search and data mining
A Fast Static Index Pruning Algorithm

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Indexing Word Sequences for Ranked Retrieval

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic pruning strategies for information retrieval systems can increase querying efficiency without decreasing effectiveness by using upper bounds to safely omit scoring documents that are unlikely to make the final retrieved set. Often, such upper bounds are pre-calculated at indexing time for a given weighting model. However, this precludes changing, adapting or training the weighting model without recalculating the upper bounds. Instead, upper bounds should be approximated at querying time from various statistics of each term to allow on-the-fly adaptation of the applied retrieval strategy. This article, by using uniform notation, formulates the problem of determining a term upper-bound given a weighting model and discusses the limitations of existing approximations. Moreover, we propose an upper-bound approximation using a constrained nonlinear maximization problem. We prove that our proposed upper-bound approximation does not impact the retrieval effectiveness of several modern weighting models from various different families. We also show the applicability of the approximation for the Markov Random Field proximity model. Finally, we empirically examine how the accuracy of the upper-bound approximation impacts the number of postings scored and the resulting efficiency in the context of several large Web test collections.