The weighted majority algorithm
Information and Computation
Mathematics of Operations Research
Journal of the ACM (JACM)
The Complexity of Optimal Queuing Network Control
Mathematics of Operations Research
Approximation algorithms
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
The Approximation of Maximum Subgraph Problems
ICALP '93 Proceedings of the 20th International Colloquium on Automata, Languages and Programming
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Asking the right questions: model-driven optimization using probes
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Combining expert advice in reactive environments
Journal of the ACM (JACM)
Approximation algorithms for budgeted learning problems
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Model-driven optimization using adaptive probes
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards
FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Relaxations of Weakly Coupled Stochastic Dynamic Programs
Operations Research
The stochastic machine replenishment problem
IPCO'08 Proceedings of the 13th international conference on Integer programming and combinatorial optimization
Approximation algorithms for stochastic inventory control models
IPCO'05 Proceedings of the 11th international conference on Integer Programming and Combinatorial Optimization
Trading in markovian price models
COLT'05 Proceedings of the 18th annual conference on Learning Theory
On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance
IEEE Transactions on Wireless Communications - Part 2
Multi-armed Bandits with Metric Switching Costs
ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Optimality of myopic sensing in multichannel opportunistic access
IEEE Transactions on Information Theory
Multi-channel opportunistic access: a case of restless bandits with multiple plays
Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Approximation algorithms for restless bandit problems
Journal of the ACM (JACM)
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
IEEE Transactions on Information Theory
An anti-jamming strategy for channel access in cognitive radio networks
GameSec'11 Proceedings of the Second international conference on Decision and Game Theory for Security
Hi-index | 0.12 |
In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this problem by showing that for an interesting and general subclass that we term Monotone bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multi-armed bandit problem. The Monotone bandit problem strictly generalizes the stochastic multi-armed bandit problem, and naturally models multi-project scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by introducing a novel "balance" constraint to the dual of a well-known LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis.