Introduction to Monte Carlo methods
Learning in graphical models
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Understanding user goals in web search
Proceedings of the 13th international conference on World Wide Web
Optimizing web search using web click-through data
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Evaluating implicit measures to improve web search
ACM Transactions on Information Systems (TOIS)
Context-sensitive information retrieval using implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Click data as implicit relevance feedback in web search
Information Processing and Management: an International Journal
Proceedings of the 25th international conference on Machine learning
Learning diverse rankings with multi-armed bandits
Proceedings of the 25th international conference on Machine learning
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
Efficient multiple-click models in web search
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Comparative analysis of clicks and judgments for IR evaluation
Proceedings of the 2009 workshop on Web Search Click Data
A dynamic bayesian network click model for web search ranking
Proceedings of the 18th international conference on World wide web
Global ranking by exploiting user clicks
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs
Proceedings of the 18th ACM conference on Information and knowledge management
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
Proceedings of the third ACM international conference on Web search and data mining
Using clicks as implicit judgments: expectations versus observations
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Learning more powerful test statistics for click-based retrieval evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing the sensitivity of information retrieval metrics
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Comparing click-through data to purchase decisions for retrieval evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A search log-based approach to evaluation
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Detecting duplicate web documents using clickthrough data
Proceedings of the fourth ACM international conference on Web search and data mining
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating new search engine configurations with pre-existing judgments and clicks
Proceedings of the 20th international conference on World wide web
A probabilistic method for inferring preferences from clicks
Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation
ACM Transactions on Information Systems (TOIS)
Automatic query type identification based on click through information
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Pattern classification using neural networks
IEEE Communications Magazine
On caption bias in interleaving experiments
Proceedings of the 21st ACM international conference on Information and knowledge management
Estimating interleaved comparison outcomes from historical click data
Proceedings of the 21st ACM international conference on Information and knowledge management
Reusing historical interaction data for faster online learning to rank for IR
Proceedings of the sixth ACM international conference on Web search and data mining
Optimized interleaving for online retrieval evaluation
Proceedings of the sixth ACM international conference on Web search and data mining
Click model-based information retrieval metrics
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Evaluating aggregated search using interleaving
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Relative confidence sampling for efficient on-line ranker evaluation
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-query pairs, which are expensive to obtain. A promising alternative is the use of interleaved comparison methods, which compare rankers using click data obtained when interleaving their rankings. In this article, we propose a framework for analyzing interleaved comparison methods. An interleaved comparison method has fidelity if the expected outcome of ranker comparisons properly corresponds to the true relevance of the ranked documents. It is sound if its estimates of that expected outcome are unbiased and consistent. It is efficient if those estimates are accurate with only little data. We analyze existing interleaved comparison methods and find that, while sound, none meet our criteria for fidelity. We propose a probabilistic interleave method, which is sound and has fidelity. We show empirically that, by marginalizing out variables that are known, it is more efficient than existing interleaved comparison methods. Using importance sampling we derive a sound extension that is able to reuse historical data collected in previous comparisons of other ranker pairs.