The weighted majority algorithm
Information and Computation
Journal of the ACM (JACM)
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
A game of prediction with expert advice
Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Discrete Prediction Games with Arbitrary Feedback and Loss
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Prediction, Learning, and Games
Prediction, Learning, and Games
Regret Minimization Under Partial Monitoring
Mathematics of Operations Research
Minimizing regret with label efficient prediction
IEEE Transactions on Information Theory
Online trading algorithms and robust option pricing
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Learning, regret minimization and option pricing
TARK '07 Proceedings of the 11th conference on Theoretical aspects of rationality and knowledge
Regret to the best vs. regret to the average
COLT'07 Proceedings of the 20th annual conference on Learning theory
Algorithm selection as a bandit problem with unbounded losses
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Algorithm portfolio selection as a bandit problem with unbounded losses
Annals of Mathematics and Artificial Intelligence
Risk-Sensitive online learning
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Hannan consistency in on-line learning in case of unbounded losses under partial monitoring
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
COLT'06 Proceedings of the 19th annual conference on Learning Theory
From external to internal regret
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Hi-index | 0.01 |
This work studies external regret in sequential prediction games with arbitrary payoffs (nonnegative or non-positive). External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. We focus on two important parameters: M, the largest absolute value of any payoff, and Q*, the sum of squared payoffs of the best action. Given these parameters we derive first a simple and new forecasting strategy with regret at most order of $\sqrt{Q^{*}({\rm ln}N)}+M {\rm ln} N$, where N is the number of actions. We extend the results to the case where the parameters are unknown and derive similar bounds. We then devise a refined analysis of the weighted majority forecaster, which yields bounds of the same flavour. The proof techniques we develop are finally applied to the adversarial multi-armed bandit setting, and we prove bounds on the performance of an online algorithm in the case where there is no lower bound on the probability of each action.