Evolution of Organizational Performance and Stability in a Stochastic Environment
Computational & Mathematical Organization Theory
Generating novel tactics through evolutionary computation
ACM SIGART Bulletin
On the advantages of non-cooperative behavior in agent populations
Mathematics and Computers in Simulation
On-line evolutionary computation for reinforcement learning in stochastic domains
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Evolutionary Function Approximation for Reinforcement Learning
The Journal of Machine Learning Research
Empirical Studies in Action Selection with Reinforcement Learning
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Optimal contraction theorem for exploration-exploitation tradeoff in search and optimization
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Heuristic search based exploration in reinforcement learning
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
CP'09 Proceedings of the 15th international conference on Principles and practice of constraint programming
Collaborative learning in uncertain environments
CARE@AI'09/CARE@IAT'10 Proceedings of the CARE@AI 2009 and CARE@IAT 2010 international conference on Collaborative agents - research and development
Balancing exploration and exploitation ratio in reinforcement learning
Proceedings of the 2011 Military Modeling & Simulation Symposium
Convergence Rates of Efficient Global Optimization Algorithms
The Journal of Machine Learning Research
Personal and Ubiquitous Computing
Hi-index | 0.00 |
We explore the two-armed bandit with Gaussian payoffs as a theoretical model for optimization. The problem is formulated from a Bayesian perspective, and the optimal strategy for both one and two pulls is provided. We present regions of parameter space where a greedy strategy is provably optimal. We also compare the greedy and optimal strategies to one based on a genetic algorithm. In doing so, we correct a previous error in the literature concerning the Gaussian bandit problem and the supposed optimality of genetic algorithms for this problem. Finally, we provide an analytically simple bandit model that is more directly applicable to optimization theory than the traditional bandit problem and determine a near-optimal strategy for that model