Using historical click data to increase interleaving sensitivity

Authors:
Eugene Kharitonov;Craig Macdonald;Pavel Serdyukov;Iadh Ounis
Affiliations:
Yandex & University of Glasgow, Moscow, Russian Fed.;University of Glasgow, Glasgow, United Kingdom;Yandex, Moscow, Russian Fed.;University of Glasgow, Glasgow, United Kingdom
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 18
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Eye-tracking analysis of user behavior in WWW search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Predicting clicks: estimating the click-through rate for new ads

Proceedings of the 16th international conference on World Wide Web
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
A dynamic bayesian network click model for web search ranking

Proceedings of the 18th international conference on World wide web
Click chain model in web search

Proceedings of the 18th international conference on World wide web
Learning more powerful test statistics for click-based retrieval evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Overlapping experiment infrastructure: more, better, faster experimentation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Inferring and using location metadata to personalize web search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A probabilistic method for inferring preferences from clicks

Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
Estimating interleaved comparison outcomes from historical click data

Proceedings of the 21st ACM international conference on Information and knowledge management
Optimized interleaving for online retrieval evaluation

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interleaving is an online evaluation method to compare two alternative ranking functions based on the users' implicit feedback. In an interleaving experiment, the results from two ranking functions are merged in a single result list and presented to the users. The users' click feedback on the merged result list is analysed to derive preferences over the ranking functions. An important property of interleaving methods is their sensitivity, i.e. their ability to reliably derive the comparison outcome with a relatively small amount of user behaviour data. This allows testing of changes in the search engine ranking functions frequently and, as a result, rapid iterations in developing search quality improvements can be achieved. In this paper we propose a novel approach to further improve interleaving sensitivity by using pre-experimental user behaviour data. In particular, the click history is used to train a click model, which is then used to predict which interleaved result pages are likely to contribute to the experiment outcome. The probabilities of presenting these interleaved result pages to the users are then optimised, such that the sensitivity of interleaving is maximised. In order to evaluate the proposed approach, we re-use data from six actual interleaving experiments, previously performed by a commercial search engine. Our results demonstrate that the proposed approach outperforms a state-of-the-art baseline, achieving up to a median of 48% reduction in the number of impressions for the same level of confidence.