Improved second-order bounds for prediction with expert advice

Authors:
Nicolò Cesa-Bianchi;Yishay Mansour;Gilles Stoltz
Affiliations:
DSI, Università di Milano, Milano, Italy;School of computer Science, Tel-Aviv University, Tel Aviv, Israel;DMA, Ecole Normale Supérieure, Paris, France
Venue:
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Year:
2005

Citing 9
Cited 9

The weighted majority algorithm

Information and Computation
How to use expert advice

Journal of the ACM (JACM)
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
A game of prediction with expert advice

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Discrete Prediction Games with Arbitrary Feedback and Loss

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Prediction, Learning, and Games

Prediction, Learning, and Games
Regret Minimization Under Partial Monitoring

Mathematics of Operations Research
Minimizing regret with label efficient prediction

IEEE Transactions on Information Theory

Online trading algorithms and robust option pricing

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Learning, regret minimization and option pricing

TARK '07 Proceedings of the 11th conference on Theoretical aspects of rationality and knowledge
Regret to the best vs. regret to the average

COLT'07 Proceedings of the 20th annual conference on Learning theory
Algorithm selection as a bandit problem with unbounded losses

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Algorithm portfolio selection as a bandit problem with unbounded losses

Annals of Mathematics and Artificial Intelligence
Risk-Sensitive online learning

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Hannan consistency in on-line learning in case of unbounded losses under partial monitoring

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Online variance minimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
From external to internal regret

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.01

Visualization

Abstract

This work studies external regret in sequential prediction games with arbitrary payoffs (nonnegative or non-positive). External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. We focus on two important parameters: M, the largest absolute value of any payoff, and Q*, the sum of squared payoffs of the best action. Given these parameters we derive first a simple and new forecasting strategy with regret at most order of $\sqrt{Q^{*}({\rm ln}N)}+M {\rm ln} N$, where N is the number of actions. We extend the results to the case where the parameters are unknown and derive similar bounds. We then devise a refined analysis of the weighted majority forecaster, which yields bounds of the same flavour. The proof techniques we develop are finally applied to the adversarial multi-armed bandit setting, and we prove bounds on the performance of an online algorithm in the case where there is no lower bound on the probability of each action.