Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

Authors:
Katja Hofmann;Shimon Whiteson;Maarten De Rijke
Affiliations:
University of Amsterdam;University of Amsterdam;University of Amsterdam
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2013

Citing 40
Cited 3

Introduction to Monte Carlo methods

Learning in graphical models
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Understanding user goals in web search

Proceedings of the 13th international conference on World Wide Web
Optimizing web search using web click-through data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Evaluating implicit measures to improve web search

ACM Transactions on Information Systems (TOIS)
Context-sensitive information retrieval using implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Click data as implicit relevance feedback in web search

Information Processing and Management: an International Journal
Exploration scavenging

Proceedings of the 25th international conference on Machine learning
Learning diverse rankings with multi-armed bandits

Proceedings of the 25th international conference on Machine learning
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient multiple-click models in web search

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Comparative analysis of clicks and judgments for IR evaluation

Proceedings of the 2009 workshop on Web Search Click Data
A dynamic bayesian network click model for web search ranking

Proceedings of the 18th international conference on World wide web
Global ranking by exploiting user clicks

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs

Proceedings of the 18th ACM conference on Information and knowledge management
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine

Proceedings of the third ACM international conference on Web search and data mining
Using clicks as implicit judgments: expectations versus observations

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Learning more powerful test statistics for click-based retrieval evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing click-through data to purchase decisions for retrieval evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A search log-based approach to evaluation

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Detecting duplicate web documents using clickthrough data

Proceedings of the fourth ACM international conference on Web search and data mining
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating new search engine configurations with pre-existing judgments and clicks

Proceedings of the 20th international conference on World wide web
A probabilistic method for inferring preferences from clicks

Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
Automatic query type identification based on click through information

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Pattern classification using neural networks

IEEE Communications Magazine
On caption bias in interleaving experiments

Proceedings of the 21st ACM international conference on Information and knowledge management
Estimating interleaved comparison outcomes from historical click data

Proceedings of the 21st ACM international conference on Information and knowledge management
Reusing historical interaction data for faster online learning to rank for IR

Proceedings of the sixth ACM international conference on Web search and data mining
Optimized interleaving for online retrieval evaluation

Proceedings of the sixth ACM international conference on Web search and data mining
Click model-based information retrieval metrics

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Evaluating aggregated search using interleaving

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Relative confidence sampling for efficient on-line ranker evaluation

Proceedings of the 7th ACM international conference on Web search and data mining
"Learning to rank for information retrieval from user interactions" by K. Hofmann, S. Whiteson, A. Schuth, and M. de Rijke with Martin Vesely as coordinator

ACM SIGWEB Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-query pairs, which are expensive to obtain. A promising alternative is the use of interleaved comparison methods, which compare rankers using click data obtained when interleaving their rankings. In this article, we propose a framework for analyzing interleaved comparison methods. An interleaved comparison method has fidelity if the expected outcome of ranker comparisons properly corresponds to the true relevance of the ranked documents. It is sound if its estimates of that expected outcome are unbiased and consistent. It is efficient if those estimates are accurate with only little data. We analyze existing interleaved comparison methods and find that, while sound, none meet our criteria for fidelity. We propose a probabilistic interleave method, which is sound and has fidelity. We show empirically that, by marginalizing out variables that are known, it is more efficient than existing interleaved comparison methods. Using importance sampling we derive a sound extension that is able to reuse historical data collected in previous comparisons of other ranker pairs.