Regular Article: Optimal Adaptive Policies for Sequential Allocation Problems

Authors:
Apostolos N. Burnetas;Michael N. Katehakis
Affiliations:
Department of Operations Research, Case Western Reserve University, Cleveland, Ohio, 44106-7235;GSM and RUTCOR, Rutgers University, Newark, New Jersey, 07102-1895
Venue:
Advances in Applied Mathematics
Year:
1996

Citing 0
Cited 4

Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM

Probability in the Engineering and Informational Sciences
Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem

The Journal of Machine Learning Research
Robustness of stochastic bandit policies

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consider the problem of sequential sampling frommstatistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters[formula], it is shown that there exists a classC"Rof adaptive policies with the following properties: (i) The expectednhorizon reward[formula]under any policy @p^0inC"Ris equal to[formula], asn-~, where[formula]is the largest population mean and[formula]is a constant. (ii) Policies inC"Rare asymptotically optimal within a larger classC"U"Fof ''uniformly fast convergent'' policies in the sense that[formula], for any @p@?C"U"Fand any[formula]such that[formula]. Policies inC"Rare specified via easily computable indices, defined as unique solutions to dual problems that arise naturally from the functional form of[formula]. In addition, the assumptions are verified for populations specified by nonparametric discrete univariate distributions with finite support. In the case of normal populations with unknown means and variances, we leave as an open problem the verification of one assumption.