Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic

Authors:
Dimitris Bertsimas;José Niño-Mora
Affiliations:
-;-
Venue:
Operations Research
Year:
2000

Citing 0
Cited 26

Optimal Stopping and Gittins' Indices for Piecewise DeterministicEvolution Processes

Discrete Event Dynamic Systems
A Topic-Specific Web Robot Model Based on Restless Bandits

IEEE Internet Computing
Dynamic Portfolio Selection of NPD Programs Using Marginal Returns

Management Science
Restless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues

Mathematics of Operations Research
Approximation algorithms for budgeted learning problems

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Linear programming relaxations and marginal productivity index policies for the buffer sharing problem

Queueing Systems: Theory and Applications
The ratio index for budgeted learning, with applications

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Approximation algorithms for restless bandit problems

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Optimality of myopic sensing in multichannel opportunistic access

IEEE Transactions on Information Theory
On the Gittins index in the M/G/1 queue

Queueing Systems: Theory and Applications
Coding and control for communication networks

Queueing Systems: Theory and Applications
A distributed network selection scheme in next generation heterogeneous wireless networks

WCNC'09 Proceedings of the 2009 IEEE conference on Wireless Communications & Networking Conference
The stochastic machine replenishment problem

IPCO'08 Proceedings of the 13th international conference on Integer programming and combinatorial optimization
Distributed optimal relay selection for QoS provisioning in wireless multi-hop cooperative networks

GLOBECOM'09 Proceedings of the 28th IEEE conference on Global telecommunications
Energy efficient distributed relay selection in wireless cooperative networks with finite state Markov channels

GLOBECOM'09 Proceedings of the 28th IEEE conference on Global telecommunications
Distributed hierarchical key management scheme in mobile ad hoc networks

GLOBECOM'09 Proceedings of the 28th IEEE conference on Global telecommunications
Optimal channel access for TCP performance improvement in cognitive radio networks: a cross-layer design approach

GLOBECOM'09 Proceedings of the 28th IEEE conference on Global telecommunications
Distributed multi-source transmission in wireless mobile peer-to-peer networks: a restless bandit approach

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Optimal network selection in heterogeneous wireless multimedia networks

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Optimal network selection in heterogeneous wireless multimedia networks

Wireless Networks
A hierarchical identity based key management scheme in tactical mobile ad hoc networks

MILCOM'09 Proceedings of the 28th IEEE conference on Military communications
Approximation algorithms for restless bandit problems

Journal of the ACM (JACM)
Optimal channel access for TCP performance improvement in cognitive radio networks

Wireless Networks
The Irrevocable Multiarmed Bandit Problem

Operations Research
The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Operations Research
Green Access Point Selection for Wireless Local Area Networks Enhanced by Cognitive Radio

Mobile Networks and Applications

Quantified Score

Hi-index	0.06

Visualization

Abstract

We develop a mathematical programming approach for the classicalPSPACE-hard restless bandit problem in stochastic optimization. We introduce a hierarchy ofN (whereN is the number of bandits) increasingly stronger linear programming relaxations, the last of which is exact and corresponds to the (exponential size) formulation of the problem as a Markov decision chain, while the other relaxations provide bounds and are efficiently computed. We also propose a priority-index heuristic scheduling policy from the solution to the firstorder relaxation, where the indices are defined in terms of optimal dual variables. In this way we propose a policy and a suboptimality guarantee. We report results of computational experiments that suggest that the proposed heuristic policy is nearly optimal. Moreover, the second-order relaxation is found to provide strong bounds on the optimal value.