Learning more powerful test statistics for click-based retrieval evaluation

Authors:
Yisong Yue;Yue Gao;Oliver Chapelle;Ya Zhang;Thorsten Joachims
Affiliations:
Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Yahoo! Incorporated, Santa Clara, CA, USA;Shanghai Jiao Tong University, Shanghai, China;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 17
Cited 8

Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating implicit measures to improve web search

ACM Transactions on Information Systems (TOIS)
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search

ACM Transactions on Information Systems (TOIS)
A user browsing model to predict search engine click data from past observations.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
A dynamic bayesian network click model for web search ranking

Proceedings of the 18th international conference on World wide web
PSkip: estimating relevance ranking quality from web search clickthrough data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine

Proceedings of the third ACM international conference on Web search and data mining
Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data

Proceedings of the 19th international conference on World wide web
Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Comparing the sensitivity of information retrieval metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
Optimized interleaving for online retrieval evaluation

Proceedings of the sixth ACM international conference on Web search and data mining
Practical online retrieval evaluation

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Click model-based information retrieval metrics

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Using historical click data to increase interleaving sensitivity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interleaving experiments are an attractive methodology for evaluating retrieval functions through implicit feedback. Designed as a blind and unbiased test for eliciting a preference between two retrieval functions, an interleaved ranking of the results of two retrieval functions is presented to the users. It is then observed whether the users click more on results from one retrieval function or the other. While it was shown that such interleaving experiments reliably identify the better of the two retrieval functions, the naive approach of counting all clicks equally leads to a suboptimal test. We present new methods for learning how to score different types of clicks so that the resulting test statistic optimizes the statistical power of the experiment. This can lead to substantial savings in the amount of data required for reaching a target confidence level. Our methods are evaluated on an operational search engine over a collection of scientific articles.