A modern Bayesian look at the multi-armed bandit

Authors:
Steven L. Scott
Affiliations:
-
Venue:
Applied Stochastic Models in Business and Industry
Year:
2010

Citing 0
Cited 7

A two-armed bandit based scheme for accelerated decentralized learning

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II
Optimistic Bayesian sampling in contextual-bandit problems

The Journal of Machine Learning Research
Uncertainty in online experiments with dependent data: an evaluation of bootstrap methods

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

Applied Intelligence
LASER: a scalable response prediction platform for online advertising

Proceedings of the 7th ACM international conference on Web search and data mining
Adaptive persuasive messaging to increase service retention: using persuasion profiles to increase the effectiveness of email reminders

Personal and Ubiquitous Computing
Designing and deploying online field experiments

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

A multi-armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi-armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.