Associative Reinforcement Learning: Functions in k-DNF
Machine Learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Experience-efficient learning in associative bandit problems
ICML '06 Proceedings of the 23rd international conference on Machine learning
Proceedings of the 25th international conference on Machine learning
Spatio-temporal models for estimating click-through rate
Proceedings of the 18th international conference on World wide web
Explore/Exploit Schemes for Web Content Optimization
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A contextual-bandit approach to personalized news article recommendation
Proceedings of the 19th international conference on World wide web
Online learning for recency search ranking using real-time user feedback
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Click shaping to optimize multiple objectives
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint relevance and freshness learning from clickthroughs for news search
Proceedings of the 21st international conference on World Wide Web
Optimistic Bayesian sampling in contextual-bandit problems
The Journal of Machine Learning Research
Personalized click shaping through lagrangian duality for online recommendation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An Online Learning Framework for Refining Recency Search Results with User Click Feedback
ACM Transactions on Information Systems (TOIS)
Estimating interleaved comparison outcomes from historical click data
Proceedings of the 21st ACM international conference on Information and knowledge management
Absence time and user engagement: evaluating ranking functions
Proceedings of the sixth ACM international conference on Web search and data mining
Reusing historical interaction data for faster online learning to rank for IR
Proceedings of the sixth ACM international conference on Web search and data mining
Optimized interleaving for online retrieval evaluation
Proceedings of the sixth ACM international conference on Web search and data mining
Content recommendation on web portals
Communications of the ACM
Proceedings of the 22nd international conference on World Wide Web
Choosing which message to publish on social networks: a contextual bandit approach
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Interactive collaborative filtering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Automatic ad format selection via contextual bandits
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods
ACM Transactions on Information Systems (TOIS)
Counterfactual reasoning and learning systems: the example of computational advertising
The Journal of Machine Learning Research
Hi-index | 0.02 |
Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.