COLT '90 Proceedings of the third annual workshop on Computational learning theory
Randomized algorithms
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Individual sequence prediction—upper bounds and application for complexity
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Adaptive Online Prediction by Following the Perturbed Leader
The Journal of Machine Learning Research
Anytime algorithms for multi-armed bandit problems
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Efficient algorithms for online decision problems
Journal of Computer and System Sciences - Special issue: Learning theory 2003
The weighted majority algorithm
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Defensive universal learning with experts
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Competitive collaborative learning
COLT'05 Proceedings of the 18th annual conference on Learning Theory
FPL analysis for adaptive bandits
SAGA'05 Proceedings of the Third international conference on StochasticAlgorithms: foundations and applications
Complexity-based induction systems: Comparisons and convergence theorems
IEEE Transactions on Information Theory
Online learning in adversarial Lipschitz environments
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Hi-index | 5.23 |
The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (''experts''), under partial observation: In each round t, only the cost of the decision played is observable. A regret minimization algorithm plays this game while achieving sublinear regret relative to each decision. It is known that an adversary controlling the costs of the decisions can force the player a regret growing as t^1^2 in the time t. In this work, we propose the first algorithm for a countably infinite set of decisions, that achieves a regret upper bounded by O(t^1^2^+^@e), i.e. arbitrarily close to optimal order. To this aim, we build on the ''follow the perturbed leader'' principle, which dates back to work by Hannan in 1957. Our results hold against an adaptive adversary, for both the expected and high probability regret of the learner w.r.t. each decision. In the second part of the paper, we consider reactive problem settings, that is, situations where the learner's decisions impact on the future behaviour of the adversary, and a strong strategy can draw benefit from well chosen past actions. We present a variant of our regret minimization algorithm which has still regret of order at most t^1^2^+^@e relative to such strong strategies, and even sublinear regret not exceeding O(t^4^5) w.r.t. the hypothetical (without external interference) performance of a strong strategy. We show how to combine the regret minimizer with a universal class of experts, given by the countable set of programs on some fixed universal Turing machine. This defines a universal learner with sublinear regret relative to any computable strategy.