Approximation algorithms for restless bandit problems

Authors:
Sudipto Guha;Kamesh Munagala;Peng Shi
Affiliations:
University of Pennsylvania, Philadelphia PA;Duke University, Durham NC;Duke University, Durham NC
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 22
Cited 7

The weighted majority algorithm

Information and Computation
Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems

Mathematics of Operations Research
How to use expert advice

Journal of the ACM (JACM)
The Complexity of Optimal Queuing Network Control

Mathematics of Operations Research
Approximation algorithms

Approximation algorithms
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
The Approximation of Maximum Subgraph Problems

ICALP '93 Proceedings of the 20th International Colloquium on Automata, Languages and Programming
Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic

Operations Research
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Asking the right questions: model-driven optimization using probes

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Combining expert advice in reactive environments

Journal of the ACM (JACM)
Approximation algorithms for budgeted learning problems

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Model-driven optimization using adaptive probes

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Relaxations of Weakly Coupled Stochastic Dynamic Programs

Operations Research
The stochastic machine replenishment problem

IPCO'08 Proceedings of the 13th international conference on Integer programming and combinatorial optimization
Approximation algorithms for stochastic inventory control models

IPCO'05 Proceedings of the 11th international conference on Integer Programming and Combinatorial Optimization
Trading in markovian price models

COLT'05 Proceedings of the 18th annual conference on Learning Theory
On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance

IEEE Transactions on Wireless Communications - Part 2

Multi-armed Bandits with Metric Switching Costs

ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Optimality of myopic sensing in multichannel opportunistic access

IEEE Transactions on Information Theory
Multi-channel opportunistic access: a case of restless bandits with multiple plays

Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Approximation algorithms for restless bandit problems

Journal of the ACM (JACM)
Sharp dichotomies for regret minimization in metric spaces

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access

IEEE Transactions on Information Theory
An anti-jamming strategy for channel access in cognitive radio networks

GameSec'11 Proceedings of the Second international conference on Decision and Game Theory for Security

Quantified Score

Hi-index	0.12

Visualization

Abstract

In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this problem by showing that for an interesting and general subclass that we term Monotone bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multi-armed bandit problem. The Monotone bandit problem strictly generalizes the stochastic multi-armed bandit problem, and naturally models multi-project scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by introducing a novel "balance" constraint to the dual of a well-known LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis.