The Irrevocable Multiarmed Bandit Problem

Authors:
Vivek F. Farias;Ritesh Madan
Affiliations:
Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Qualcomm New Jersey Research Center (NJRC), Bridgewater, New Jersey 08807
Venue:
Operations Research
Year:
2011

Citing 7
Cited 1

Minimization methods for non-differentiable functions

Minimization methods for non-differentiable functions
Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic

Operations Research
Convex Optimization

Convex Optimization
Dynamic Assortment with Demand Learning for Seasonal Consumer Goods

Management Science
Approximation algorithms for budgeted learning problems

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
A Learning Approach for Interactive Marketing to a Customer Segment

Operations Research
Optimal employee retention when inferring unknown learning curves

Proceedings of the Winter Simulation Conference

Optimal Dynamic Assortment Planning with Demand Learning

Manufacturing & Service Operations Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper considers the multiarmed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective, and we refer to this problem as the “irrevocable multiarmed bandit” problem. We observe that natural modifications to well-known heuristics for multiarmed bandit problems that satisfy this irrevocability constraint have unsatisfactory performance and, thus motivated, introduce a new heuristic: the “packing” heuristic. We establish through numerical experiments that the packing heuristic offers excellent performance, even relative to heuristics that are not constrained to be irrevocable. We also provide a theoretical analysis that studies the “price” of irrevocability, i.e., the performance loss incurred in imposing the constraint we propose on the multiarmed bandit model. We show that this performance loss is uniformly bounded for a general class of multiarmed bandit problems and indicate its dependence on various problem parameters. Finally, we obtain a computationally fast algorithm to implement the packing heuristic; the algorithm renders the packing heuristic computationally cheaper than methods that rely on the computation of Gittins indices.