Multi-armed bandit algorithms and empirical evaluation

Authors:
Joannès Vermorel;Mehryar Mohri
Affiliations:
École normale supérieure, Paris, France;Courant Institute of Mathematical Sciences, New York, NY
Venue:
ECML'05 Proceedings of the 16th European conference on Machine Learning
Year:
2005

Citing 13
Cited 33

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Learning in embedded systems

Learning in embedded systems
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
On the use and performance of content distribution networks

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Finite-Time Regret Bounds for the Multiarmed Bandit Problem

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Learning and planning in structured worlds

Learning and planning in structured worlds
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
The weighted majority algorithm

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science

Deciding what to observe next: adaptive variable selection for regression in multivariate data streams

Proceedings of the 2008 ACM symposium on Applied computing
QoS-LI: QoS loss inference in disadvantaged networks -- part II

Proceedings of the 11th communications and networking simulation symposium
Following the Perturbed Leader to Gamble at Multi-armed Bandits

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Self-adaptive Mutation Only Genetic Algorithm: An Application on the Optimization of Airport Capacity Utilization

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Revenue Maximising Adaptive Auctioneer Agent

PRIMA '08 Proceedings of the 11th Pacific Rim International Conference on Multi-Agents: Intelligent Agents and Multi-Agent Systems
Improving the Exploration Strategy in Bandit Algorithms

Learning and Intelligent Optimization
Website Morphing

Marketing Science
Setting discrete bid levels adaptively in repeated auctions

Proceedings of the 11th International Conference on Electronic Commerce
An experts algorithm for transfer learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Robot task switching under diminishing returns

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
A self-optimized job scheduler for heterogeneous server clusters

JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
Adaptive ε-greedy exploration in reinforcement learning based on value differences

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Tug-of-war model for multi-armed bandit problem

UC'10 Proceedings of the 9th international conference on Unconventional computation
Solving non-stationary bandit problems by random sampling from sibling Kalman filters

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Automating the runtime performance evaluation of simulation algorithms

Winter Simulation Conference
A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies

Winter Simulation Conference
A dynamic programming strategy to balance exploration and exploitation in the bandit problem

Annals of Mathematics and Artificial Intelligence
Click shaping to optimize multiple objectives

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to trade off between exploration and exploitation in multiclass bandit prediction

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning the demand curve in posted-price digital goods auctions

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Value-difference based exploration: adaptive control between epsilon-greedy and softmax

KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
InRout - A QoS aware route selection algorithm for industrial wireless sensor networks

Ad Hoc Networks
Chasing a Moving Target: Exploitation and Exploration in Dynamic Environments

Management Science
An anti-jamming strategy for channel access in cognitive radio networks

GameSec'11 Proceedings of the Second international conference on Decision and Game Theory for Security
The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Operations Research
DCOPs and bandits: exploration and exploitation in decentralised coordination

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Personalized click shaping through lagrangian duality for online recommendation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An adaptive simulator for ML-rules

Proceedings of the Winter Simulation Conference
Micro adaptivity in Vectorwise

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Crowd mining

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A generic adaptive simulation algorithm for component-based simulation systems

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Automatic ad format selection via contextual bandits

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Counterfactual reasoning and learning systems: the example of computational advertising

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, Poker (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.