Where to stop reading a ranked list?: threshold optimization using truncated score distributions

Authors:
Avi Arampatzis;Jaap Kamps;Stephen Robertson
Affiliations:
University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 6
Cited 19

A probabilistic solution to the selection and fusion problem in distributed information retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The score-distributional threshold optimization for adaptive binary classification tasks

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
On score distributions and relevance

ECIR'07 Proceedings of the 29th European conference on IR research

Score Distributions in Information Retrieval

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
A signal-to-noise approach to score normalization

Proceedings of the 18th ACM conference on Information and knowledge management
Score distribution models: assumptions, intuition, and robustness to score manipulation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
www.MMRetrieval.net: a multimodal search engine

Proceedings of the Third International Conference on SImilarity Search and APplications
Modeling score distributions in information retrieval

Information Retrieval
Dynamic two-stage image retrieval from large multimodal databases

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Fusion vs. two-stage for multimodal retrieval

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
A cascade ranking model for efficient ranked retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning to advertise: how many ads are enough?

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Protocol-driven searches for medical and health-sciences systematic reviews

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Predicting Query Performance by Query-Drift Estimation

ACM Transactions on Information Systems (TOIS)
Measuring the ability of score distributions to model relevance

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Extended expectation maximization for inferring score distributions

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
On the inference of average precision from score distributions

Proceedings of the 21st ACM international conference on Information and knowledge management
Dynamic two-stage image retrieval from large multimedia databases

Information Processing and Management: an International Journal
Modelling Score Distributions Without Actual Scores

Proceedings of the 2013 Conference on the Theory of Information Retrieval
The whens and hows of learning to rank for web search

Information Retrieval
Document Score Distribution Models for Query Performance Inference and Prediction

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ranked retrieval has a particular disadvantage in comparison with traditional Boolean retrieval: there is no clear cut-off point where to stop consulting results. This is a serious problem in some setups. We investigate and further develop methods to select the rank cut-off value which optimizes a given effectiveness measure. Assuming no other input than a system's output for a query--document scores and their distribution--the task is essentially a score-distributional threshold optimization problem. The recent trend in modeling score distributions is to use a normal-exponential mixture: normal for relevant, and exponential for non-relevant document scores. We discuss the two main theoretical problems with the current model, support incompatibility and non-convexity, and develop new models that address them. The main contributions of the paper are two truncated normal-exponential models, varying in the way the out-truncated score ranges are handled. We conduct a range of experiments using the TREC 2007 and 2008 Legal Track data, and show that the truncated models lead to significantly better results.