Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
The Continuum-Armed Bandit Problem
SIAM Journal on Control and Optimization
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
A Constraint-Based Method for Project Scheduling with Time Windows
Journal of Heuristics
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
An Iterated Dynasearch Algorithm for the Single-Machine Total Weighted Tardiness Scheduling Problem
INFORMS Journal on Computing
Enhancing Stochastic Search Performance by Value-Biased Randomization of Heuristics
Journal of Heuristics
An effective algorithm for project scheduling with arbitrary temporal constraints
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Reinforcement learning for active model selection
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2
Bandit-based optimization on graphs with application to library performance tuning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
An asymptotically optimal algorithm for the max k-armed bandit problem
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Efficient Multi-start Strategies for Local Search Algorithms
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A decision-theoretic formalism for belief-optimal reasoning
PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
Algorithm selection as a bandit problem with unbounded losses
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Dynamic sample budget allocation in model-based optimization
Journal of Global Optimization
Efficient multi-start strategies for local search algorithms
Journal of Artificial Intelligence Research
A simple distribution-free approach to the max k-armed bandit problem
CP'06 Proceedings of the 12th international conference on Principles and Practice of Constraint Programming
Pilot, rollout and monte carlo tree search methods for job shop scheduling
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Efficiently gathering information in costly domains
Decision Support Systems
Hi-index | 0.00 |
The multiarmed bandit is often used as an analogy for the tradeoff between exploration and exploitation in search problems. The classic problem involves allocating trials to the arms of a multiarmed slot machine to maximize the expected sum of rewards. We pose a new variation of the multiarmed bandit--the Max K-Armed Bandit--in which trials must be allocated among the arms to maximize the expected best single sample reward of the series of trials. Motivation for the Max K-Armed Bandit is the allocation of restarts among a set of multistart stochastic search algorithms. We present an analysis of this Max K-Armed Bandit showing under certain assumptions that the optimal strategy allocates trials to the observed best arm at a rate increasing double exponentially relative to the other arms. This motivates an exploration strategy that follows a Boltzmann distribution with an exponentially decaying temperature parameter. We compare this exploration policy to policies that allocate trials to the observed best arm at rates faster (and slower) than double exponentially. The results confirm, for two scheduling domains, that the double exponential increase in the rate of allocations to the observed best heuristic outperfonns the other approaches.