A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

Authors:
K. D. Glazebrook;R. Minty
Affiliations:
Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, United Kingdom;Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, United Kingdom
Venue:
Mathematics of Operations Research
Year:
2009

Citing 9
Cited 0

The multi-armed bandit problem: decomposition and computation

Mathematics of Operations Research
Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems

Mathematics of Operations Research
The Complexity of Optimal Queuing Network Control

Mathematics of Operations Research
Index Policies and a Novel Performance Space Structure for a Class of Generalised Branching Bandit Problems

Mathematics of Operations Research
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Dynamic Scheduling of a Two-Class Queue with Setups

Operations Research
Index Policies for Stochastic Search in a Forest with an Application to R&D Project Management

Mathematics of Operations Research
Dynamic routing to heterogeneous collections of unreliable servers

Queueing Systems: Theory and Applications
A (2/3)n3 Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain

INFORMS Journal on Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We generalise classical multiarmed bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource, which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch, provided they do not consume more resource than is available. We propose suitable bandit indices that reduce to those proposed by Gittins [Gittins, J. C. 1979. Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc.B41 148--177] for the classical models. The index that emerges is an elegant generalization of the Gittins index, which measures in a natural way the reward earnable from a bandit per unit of resource consumed. The paper discusses both how such indices may be computed and how they may be used to construct heuristics for resource distribution. We also describe how to develop bounds on the closeness to optimality of index heuristics and demonstrate a form of asymptotic optimality for a greedy index heuristic in a class of simple models. A numerical study testifies to the strong performance of a weighted index heuristic.