The multi-armed bandit problem: decomposition and computation
Mathematics of Operations Research
Mathematics of Operations Research
The Complexity of Optimal Queuing Network Control
Mathematics of Operations Research
Mathematics of Operations Research
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Dynamic Scheduling of a Two-Class Queue with Setups
Operations Research
Index Policies for Stochastic Search in a Forest with an Application to R&D Project Management
Mathematics of Operations Research
Dynamic routing to heterogeneous collections of unreliable servers
Queueing Systems: Theory and Applications
A (2/3)n3 Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain
INFORMS Journal on Computing
Hi-index | 0.00 |
We generalise classical multiarmed bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource, which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch, provided they do not consume more resource than is available. We propose suitable bandit indices that reduce to those proposed by Gittins [Gittins, J. C. 1979. Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc.B41 148--177] for the classical models. The index that emerges is an elegant generalization of the Gittins index, which measures in a natural way the reward earnable from a bandit per unit of resource consumed. The paper discusses both how such indices may be computed and how they may be used to construct heuristics for resource distribution. We also describe how to develop bounds on the closeness to optimality of index heuristics and demonstrate a form of asymptotic optimality for a greedy index heuristic in a class of simple models. A numerical study testifies to the strong performance of a weighted index heuristic.