Estimating interleaved comparison outcomes from historical click data

Authors:
Katja Hofmann;Shimon Whiteson;Maarten de Rijke
Affiliations:
University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 18
Cited 6

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Click data as implicit relevance feedback in web search

Information Processing and Management: an International Journal
Robust test collections for retrieval evaluation

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient multiple-click models in web search

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Comparative analysis of clicks and judgments for IR evaluation

Proceedings of the 2009 workshop on Web Search Click Data
Global ranking by exploiting user clicks

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs

Proceedings of the 18th ACM conference on Information and knowledge management
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine

Proceedings of the third ACM international conference on Web search and data mining
Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing click-through data to purchase decisions for retrieval evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A search log-based approach to evaluation

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating new search engine configurations with pre-existing judgments and clicks

Proceedings of the 20th international conference on World wide web
A probabilistic method for inferring preferences from clicks

Proceedings of the 20th ACM international conference on Information and knowledge management

Reusing historical interaction data for faster online learning to rank for IR

Proceedings of the sixth ACM international conference on Web search and data mining
Practical online retrieval evaluation

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Using historical click data to increase interleaving sensitivity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lerot: an online learning to rank framework

Proceedings of the 2013 workshop on Living labs for information retrieval evaluation
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

ACM Transactions on Information Systems (TOIS)
"Learning to rank for information retrieval from user interactions" by K. Hofmann, S. Whiteson, A. Schuth, and M. de Rijke with Martin Vesely as coordinator

ACM SIGWEB Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interleaved comparison methods, which compare rankers using click data, are a promising alternative to traditional information retrieval evaluation methods that require expensive explicit judgments. A major limitation of these methods is that they assume access to live data, meaning that new data must be collected for every pair of rankers compared. We investigate the use of previously collected click data (i.e., historical data) for interleaved comparisons. We start by analyzing to what degree existing interleaved comparison methods can be applied and find that a recent probabilistic method allows such data reuse, even though it is biased when applied to historical data. We then propose an interleaved comparison method that is based on the probabilistic approach but uses importance sampling to compensate for bias. We experimentally confirm that probabilistic methods make the use of historical data for interleaved comparisons possible and effective.