Active Learning in Multi-armed Bandits

Authors:
András Antos;Varun Grover;Csaba Szepesvári
Affiliations:
Computer and Automation Research Institute, of the Hungarian Academy of Sciences, Budapest, Hungary 1111;Department of Computing Science, University of Alberta, Edmonton, Canada T6G 2E8;Computer and Automation Research Institute, of the Hungarian Academy of Sciences, Budapest, Hungary 1111 and Department of Computing Science, University of Alberta, Edmonton, Canada T6G 2E8
Venue:
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Year:
2008

Citing 2
Cited 1

Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Measure Theory and Probability Theory (Springer Texts in Statistics)

Measure Theory and Probability Theory (Springer Texts in Statistics)

New algorithms for budgeted learning

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The algorithms can select which option to generate the next sample from in order to produce estimates with equally good precision for all the distributions. When an algorithm uses sample means to estimate the unknown values then the optimal solution, assuming full knowledge of the distributions, is to sample each option proportional to its variance. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as n茂戮驴 3/2, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated in a simple problem.