Large-scale validation and analysis of interleaved search evaluation

Authors:
Olivier Chapelle;Thorsten Joachims;Filip Radlinski;Yisong Yue
Affiliations:
Yahoo! Research;Cornell University;Microsoft;Carnegie Mellon University
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2012

Citing 44
Cited 13

Information filtering based on user behavior analysis and best match text retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Computing with Noisy Information

SIAM Journal on Computing
Implicit interest indicators

Proceedings of the 6th international conference on Intelligent user interfaces
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Use of Implicit Evidence for Relevance Feedback in Web Retrieval

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Implicit feedback for inferring user preference: a bibliography

ACM SIGIR Forum
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating implicit measures to improve web search

ACM Transactions on Information Systems (TOIS)
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating implicit feedback models using searcher simulations

ACM Transactions on Information Systems (TOIS)
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
User performance versus precision measures for simple search tasks

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search

ACM Transactions on Information Systems (TOIS)
Automatic search engine performance evaluation with click-through data analysis

Proceedings of the 16th international conference on World Wide Web
How well does result relevance predict session satisfaction?

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Characterizing the value of personalizing search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Eye movements as implicit relevance feedback

CHI '08 Extended Abstracts on Human Factors in Computing Systems
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
Generating labels from clicks

Proceedings of the Second ACM International Conference on Web Search and Data Mining
A dynamic bayesian network click model for web search ranking

Proceedings of the 18th international conference on World wide web
Interactively optimizing information retrieval systems as a dueling bandits problem

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
PSkip: estimating relevance ranking quality from web search clickthrough data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Minimally invasive randomization for collecting unbiased preferences from clickthrough logs

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Modeling contextual factors of click rates

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Letizia: an agent that assists web browsing

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs

Proceedings of the 18th ACM conference on Information and knowledge management
Redundancy, diversity and interdependent document relevance

ACM SIGIR Forum
Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data

Proceedings of the 19th international conference on World wide web
Inferring query intent from reformulations and clicks

Proceedings of the 19th international conference on World wide web
Here or there: preference judgments for relevance

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Learning more powerful test statistics for click-based retrieval evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Understanding temporal query dynamics

Proceedings of the fourth ACM international conference on Web search and data mining
A probabilistic method for inferring preferences from clicks

Proceedings of the 20th ACM international conference on Information and knowledge management
Implicit relevance feedback from eye movements

ICANN'05 Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I

On caption bias in interleaving experiments

Proceedings of the 21st ACM international conference on Information and knowledge management
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

Proceedings of the sixth ACM international conference on Web search and data mining
Absence time and user engagement: evaluating ranking functions

Proceedings of the sixth ACM international conference on Web search and data mining
Reusing historical interaction data for faster online learning to rank for IR

Proceedings of the sixth ACM international conference on Web search and data mining
Optimized interleaving for online retrieval evaluation

Proceedings of the sixth ACM international conference on Web search and data mining
Practical online retrieval evaluation

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Click model-based information retrieval metrics

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Quality-biased ranking for queries with commercial intent

Proceedings of the 22nd international conference on World Wide Web companion
Using historical click data to increase interleaving sensitivity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Evaluating aggregated search using interleaving

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

ACM Transactions on Information Systems (TOIS)
Relative confidence sampling for efficient on-line ranker evaluation

Proceedings of the 7th ACM international conference on Web search and data mining
"Learning to rank for information retrieval from user interactions" by K. Hofmann, S. Whiteson, A. Schuth, and M. de Rijke with Martin Vesely as coordinator

ACM SIGWEB Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interleaving is an increasingly popular technique for evaluating information retrieval systems based on implicit user feedback. While a number of isolated studies have analyzed how this technique agrees with conventional offline evaluation approaches and other online techniques, a complete picture of its efficiency and effectiveness is still lacking. In this paper we extend and combine the body of empirical evidence regarding interleaving, and provide a comprehensive analysis of interleaving using data from two major commercial search engines and a retrieval system for scientific literature. In particular, we analyze the agreement of interleaving with manual relevance judgments and observational implicit feedback measures, estimate the statistical efficiency of interleaving, and explore the relative performance of different interleaving variants. We also show how to learn improved credit-assignment functions for clicks that further increase the sensitivity of interleaving.