Indexability of bandit problems with response delays

Authors:
Felipe Caro;Onesun steve Yoo
Affiliations:
Ucla anderson school of management los angeles, ca 90095, e-mail: fcaro@anderson.ucla.edu/ onesun.yoo.2010@anderson.ucla.edu;Ucla anderson school of management los angeles, ca 90095, e-mail: fcaro@anderson.ucla.edu/ onesun.yoo.2010@anderson.ucla.edu
Venue:
Probability in the Engineering and Informational Sciences
Year:
2010

Citing 5
Cited 0

Turnpike optimality of Smith's Rule in parallel machines stochastic scheduling

Mathematics of Operations Research
Consequences of Order Crossover Under Order-Up-To Inventory Policies

Manufacturing & Service Operations Management
ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM

Probability in the Engineering and Informational Sciences
Dynamic Assortment with Demand Learning for Seasonal Consumer Goods

Management Science
A Learning Approach for Interactive Marketing to a Customer Segment

Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article considers an important class of discrete time restless bandits, given by the discounted multiarmed bandit problems with response delays. The delays in each period are independent random variables, in which the delayed responses do not cross over. For a bandit arm in this class, we use a coupling argument to show that in each state there is a unique subsidy that equates the pulling and nonpulling actions (i.e., the bandit satisfies the indexibility criterion introduced by Whittle (1988). The result allows for infinite or finite horizon and holds for arbitrary delay lengths and infinite state spaces. We compute the resulting marginal productivity indexes (MPI) for the Beta-Bernoulli Bayesian learning model, formulate and compute a tractable upper bound, and compare the suboptimality gap of the MPI policy to those of other heuristics derived from different closed-form indexes. The MPI policy performs near optimally and provides a theoretical justification for the use of the other heuristics.