Bandit problems and the exploration/exploitation tradeoff

Authors:
W. G. Macready;D. H. Wolpert, II
Affiliations:
Bios Group, Santa Fe, NM;-
Venue:
IEEE Transactions on Evolutionary Computation
Year:
1998

Citing 0
Cited 15

Evolution of Organizational Performance and Stability in a Stochastic Environment

Computational & Mathematical Organization Theory
Generating novel tactics through evolutionary computation

ACM SIGART Bulletin
On the advantages of non-cooperative behavior in agent populations

Mathematics and Computers in Simulation
On-line evolutionary computation for reinforcement learning in stochastic domains

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
Empirical Studies in Action Selection with Reinforcement Learning

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Optimal contraction theorem for exploration-exploitation tradeoff in search and optimization

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Heuristic search based exploration in reinforcement learning

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
The burden of proof: part II

WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
Dialectic search

CP'09 Proceedings of the 15th international conference on Principles and practice of constraint programming
Collaborative learning in uncertain environments

CARE@AI'09/CARE@IAT'10 Proceedings of the CARE@AI 2009 and CARE@IAT 2010 international conference on Collaborative agents - research and development
Balancing exploration and exploitation ratio in reinforcement learning

Proceedings of the 2011 Military Modeling & Simulation Symposium
Convergence Rates of Efficient Global Optimization Algorithms

The Journal of Machine Learning Research
Adaptive persuasive messaging to increase service retention: using persuasion profiles to increase the effectiveness of email reminders

Personal and Ubiquitous Computing
A model for analysing the collective dynamic behaviour and characterising the exploitation of population-based algorithms

Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the two-armed bandit with Gaussian payoffs as a theoretical model for optimization. The problem is formulated from a Bayesian perspective, and the optimal strategy for both one and two pulls is provided. We present regions of parameter space where a greedy strategy is provably optimal. We also compare the greedy and optimal strategies to one based on a genetic algorithm. In doing so, we correct a previous error in the literature concerning the Gaussian bandit problem and the supposed optimality of genetic algorithms for this problem. Finally, we provide an analytically simple bandit model that is more directly applicable to optimization theory than the traditional bandit problem and determine a near-optimal strategy for that model