Proceedings of the seventh international conference (1990) on Machine learning
Learning in embedded systems
On the use and performance of content distribution networks
IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Finite-Time Regret Bounds for the Multiarmed Bandit Problem
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Learning and planning in structured worlds
Learning and planning in structured worlds
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
The weighted majority algorithm
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Proceedings of the 2008 ACM symposium on Applied computing
QoS-LI: QoS loss inference in disadvantaged networks -- part II
Proceedings of the 11th communications and networking simulation symposium
Following the Perturbed Leader to Gamble at Multi-armed Bandits
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Revenue Maximising Adaptive Auctioneer Agent
PRIMA '08 Proceedings of the 11th Pacific Rim International Conference on Multi-Agents: Intelligent Agents and Multi-Agent Systems
Improving the Exploration Strategy in Bandit Algorithms
Learning and Intelligent Optimization
Marketing Science
Setting discrete bid levels adaptively in repeated auctions
Proceedings of the 11th International Conference on Electronic Commerce
An experts algorithm for transfer learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Robot task switching under diminishing returns
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
A self-optimized job scheduler for heterogeneous server clusters
JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
Adaptive ε-greedy exploration in reinforcement learning based on value differences
KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Tug-of-war model for multi-armed bandit problem
UC'10 Proceedings of the 9th international conference on Unconventional computation
Solving non-stationary bandit problems by random sampling from sibling Kalman filters
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Automating the runtime performance evaluation of simulation algorithms
Winter Simulation Conference
Winter Simulation Conference
A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Annals of Mathematics and Artificial Intelligence
Click shaping to optimize multiple objectives
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to trade off between exploration and exploitation in multiclass bandit prediction
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning the demand curve in posted-price digital goods auctions
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Value-difference based exploration: adaptive control between epsilon-greedy and softmax
KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
An anti-jamming strategy for channel access in cognitive radio networks
GameSec'11 Proceedings of the Second international conference on Decision and Game Theory for Security
The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
Operations Research
DCOPs and bandits: exploration and exploitation in decentralised coordination
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Personalized click shaping through lagrangian duality for online recommendation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An adaptive simulator for ML-rules
Proceedings of the Winter Simulation Conference
Micro adaptivity in Vectorwise
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A generic adaptive simulation algorithm for component-based simulation systems
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Automatic ad format selection via contextual bandits
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Counterfactual reasoning and learning systems: the example of computational advertising
The Journal of Machine Learning Research
Hi-index | 0.00 |
The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, Poker (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.