Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
ACM Transactions on Information Systems (TOIS)
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
Here or there: preference judgments for relevance
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Validating query simulators: an experiment using commercial searches and purchases
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Balancing exploration and exploitation in learning to rank online
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Enhanced results for web search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A probabilistic method for inferring preferences from clicks
Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation
ACM Transactions on Information Systems (TOIS)
On caption bias in interleaving experiments
Proceedings of the 21st ACM international conference on Information and knowledge management
Estimating interleaved comparison outcomes from historical click data
Proceedings of the 21st ACM international conference on Information and knowledge management
Optimized interleaving for online retrieval evaluation
Proceedings of the sixth ACM international conference on Web search and data mining
Evaluating aggregated search using interleaving
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lerot: an online learning to rank framework
Proceedings of the 2013 workshop on Living labs for information retrieval evaluation
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods
ACM Transactions on Information Systems (TOIS)
Relative confidence sampling for efficient on-line ranker evaluation
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
The Cranfield evaluation method has some disadvantages, including its high cost in labor and inadequacy for evaluating interactive retrieval techniques. As a very promising alternative, automatic comparison of retrieval systems based on observed clicking behavior of users has recently been studied. Several methods have been proposed, but there has so far been no systematic way to assess which strategy is better, making it difficult to choose a good method for real applications. In this paper, we propose a general way to evaluate these relative comparison methods with two measures: utility to users(UtU) and effectiveness of differentiation(EoD). We evaluate two state of the art methods by systematically simulating different retrieval scenarios. Inspired by the weakness of these methods revealed through our evaluation, we further propose a novel method by considering the positions of clicked documents. Experiment results show that our new method performs better than the existing methods.