Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Experience-efficient learning in associative bandit problems
ICML '06 Proceedings of the 23rd international conference on Machine learning
An experimental comparison of click position-bias models
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Proceedings of the 25th international conference on Machine learning
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
Efficient multiple-click models in web search
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Interactively optimizing information retrieval systems as a dueling bandits problem
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning to Rank for Information Retrieval
Foundations and Trends in Information Retrieval
A contextual-bandit approach to personalized news article recommendation
Proceedings of the 19th international conference on World wide web
Comparing the sensitivity of information retrieval metrics
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
Proceedings of the fourth ACM international conference on Web search and data mining
A probabilistic method for inferring preferences from clicks
Proceedings of the 20th ACM international conference on Information and knowledge management
Semi-supervised learning to rank with preference regularization
Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation
ACM Transactions on Information Systems (TOIS)
Incorporating revisiting behaviors into click models
Proceedings of the fifth ACM international conference on Web search and data mining
The K-armed dueling bandits problem
Journal of Computer and System Sciences
Estimating interleaved comparison outcomes from historical click data
Proceedings of the 21st ACM international conference on Information and knowledge management
Click model-based information retrieval metrics
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Lerot: an online learning to rank framework
Proceedings of the 2013 workshop on Living labs for information retrieval evaluation
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
Online learning to rank for information retrieval (IR) holds promise for allowing the development of "self-learning" search engines that can automatically adjust to their users. With the large amount of e.g., click data that can be collected in web search settings, such techniques could enable highly scalable ranking optimization. However, feedback obtained from user interactions is noisy, and developing approaches that can learn from this feedback quickly and reliably is a major challenge. In this paper we investigate whether and how previously collected (historical) interaction data can be used to speed up learning in online learning to rank for IR. We devise the first two methods that can utilize historical data (1) to make feedback available during learning more reliable and (2) to preselect candidate ranking functions to be evaluated in interactions with users of the retrieval system. We evaluate both approaches on 9 learning to rank data sets and find that historical data can speed up learning, leading to substantially and significantly higher online performance. In particular, our pre-selection method proves highly effective at compensating for noise in user feedback. Our results show that historical data can be used to make online learning to rank for IR much more effective than previously possible, especially when feedback is noisy.