A Faster Index Algorithm and a Computational Study for Bandits with Switching Costs

Authors:
José Niòo-Mora
Affiliations:
Department of Statistics, Universidad Carlos III de Madrid, C/Madrid 126, 28903 Getafe (Madrid), Spain
Venue:
INFORMS Journal on Computing
Year:
2008

Citing 0
Cited 5

Characterization and computation of restless bandit marginal productivity indices

Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
Computing an index policy for bandits with switching penalties

Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
A Marginal Productivity Index Rule for Scheduling Multiclass Queues with Setups

Network Control and Optimization
Rejoinder---Response to Comments on “Website Morphing”

Marketing Science
Computing an index policy for multiarmed bandits with deadlines

Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the intractable multi-armed bandit problem with switching costs, for which an index that partially characterizes optimal policies was introduced (Asawa, M., D. Teneketzis. 1996. Multi-armed bandits with switching penalties. IEEE Trans. Automatic Control41 328--348), attaching to each project state a “continuation index” (its Gittins index) and a “switching index.” Asawa and Teneketzis proposed to jointly compute both as the Gittins index of a project with 2n states---when the original project has n states---resulting in an eightfold increase in O(n3) arithmetic operations relative to those to compute the continuation index. We present a faster decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n2 + O(n) arithmetic operations, achieving overall a fourfold reduction in arithmetic operations and substantially reduced memory operations. The analysis exploits the fact that the Asawa and Teneketzis index is the marginal productivity index of the project in its restless reformulation, using methods introduced by the author. Extensive computational experiments are reported, which demonstrate the dramatic runtime speedups achieved by the new algorithm, as well as the near optimality of the resultant index policy and its substantial gains against the benchmark Gittins index policy across a wide range of randomly generated two-and three-project instances.