Asymptotically efficient adaptive control in stochastic regression models
Advances in Applied Mathematics
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Measure Theory and Probability Theory (Springer Texts in Statistics)
Measure Theory and Probability Theory (Springer Texts in Statistics)
Active learning with statistical models
Journal of Artificial Intelligence Research
Upper-confidence-bound algorithms for active learning in multi-armed bandits
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Hi-index | 5.23 |
We consider the problem of actively learning the mean values of distributions associated with a finite number of options. The decision maker can select which option to generate the next observation from, the goal being to produce estimates with equally good precision for all the options. If sample means are used to estimate the unknown values then the optimal solution, assuming that the distributions are known up to a shift, is to sample from each distribution proportional to its variance. No information other than the distributions' variances is needed to calculate the optimal solution. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as n^-^3^/^2, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated on a simple problem.