Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Authors:
Lihong Li;Wei Chu;John Langford;Xuanhui Wang
Affiliations:
Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 12
Cited 17

Associative Reinforcement Learning: Functions in k-DNF

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Experience-efficient learning in associative bandit problems

ICML '06 Proceedings of the 23rd international conference on Machine learning
Exploration scavenging

Proceedings of the 25th international conference on Machine learning
Spatio-temporal models for estimating click-through rate

Proceedings of the 18th international conference on World wide web
Explore/Exploit Schemes for Web Content Optimization

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A contextual-bandit approach to personalized news article recommendation

Proceedings of the 19th international conference on World wide web
Online learning for recency search ranking using real-time user feedback

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Click shaping to optimize multiple objectives

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint relevance and freshness learning from clickthroughs for news search

Proceedings of the 21st international conference on World Wide Web
Optimistic Bayesian sampling in contextual-bandit problems

The Journal of Machine Learning Research
Personalized click shaping through lagrangian duality for online recommendation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An Online Learning Framework for Refining Recency Search Results with User Click Feedback

ACM Transactions on Information Systems (TOIS)
Estimating interleaved comparison outcomes from historical click data

Proceedings of the 21st ACM international conference on Information and knowledge management
Absence time and user engagement: evaluating ranking functions

Proceedings of the sixth ACM international conference on Web search and data mining
Reusing historical interaction data for faster online learning to rank for IR

Proceedings of the sixth ACM international conference on Web search and data mining
Optimized interleaving for online retrieval evaluation

Proceedings of the sixth ACM international conference on Web search and data mining
Content recommendation on web portals

Communications of the ACM
Content-aware click modeling

Proceedings of the 22nd international conference on World Wide Web
Choosing which message to publish on social networks: a contextual bandit approach

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Ranked bandits in metric spaces: learning diverse rankings over large document collections

The Journal of Machine Learning Research
Interactive collaborative filtering

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Automatic ad format selection via contextual bandits

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

ACM Transactions on Information Systems (TOIS)
Counterfactual reasoning and learning systems: the example of computational advertising

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.02

Visualization

Abstract

Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.