A simple distribution-free approach to the max k-armed bandit problem

Authors:
Matthew J. Streeter;Stephen F. Smith
Affiliations:
Computer Science Department and Center for the Neural Basis of Cognition;The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA
Venue:
CP'06 Proceedings of the 12th international conference on Principles and Practice of Constraint Programming
Year:
2006

Citing 6
Cited 8

Learning in embedded systems

Learning in embedded systems
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Solving Project Scheduling Problems by Minimum Cut Computations

Management Science
An asymptotically optimal algorithm for the max k-armed bandit problem

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
The max K-armed bandit: a new model of exploration applied to search heuristic selection

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3

An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Bandit-based optimization on graphs with application to library performance tuning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient Multi-start Strategies for Local Search Algorithms

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Dynamic sample budget allocation in model-based optimization

Journal of Global Optimization
Efficient multi-start strategies for local search algorithms

Journal of Artificial Intelligence Research
Multi-armed bandits with episode context

Annals of Mathematics and Artificial Intelligence
Pilot, rollout and monte carlo tree search methods for job shop scheduling

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
BoostingTree: parallel selection of weak learners in boosting, with application to ranking

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The max k-armed bandit problem is a recently-introduced online optimization problem with practical applications to heuristic search. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the maximum payoff received over a series of n trials. Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional assumptions. We demonstrate the effectiveness of our approach by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).