Improved second-order bounds for prediction with expert advice

Authors:
Nicolò Cesa-Bianchi;Yishay Mansour;Gilles Stoltz
Affiliations:
DSI, Università di Milano, Milano, Italy 20135;School of Computer Science, Tel-Aviv University, Tel Aviv, Israel;CNRS and Département de Mathématiques et Applications, Ecole Normale Supérieure, Paris, France 75005
Venue:
Machine Learning
Year:
2007

Citing 9
Cited 18

The weighted majority algorithm

Information and Computation
How to use expert advice

Journal of the ACM (JACM)
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
A game of prediction with expert advice

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Potential-Based Algorithms in On-Line Prediction and Game Theory

Machine Learning
Prediction, Learning, and Games

Prediction, Learning, and Games
Regret Minimization Under Partial Monitoring

Mathematics of Operations Research
Minimizing regret with label efficient prediction

IEEE Transactions on Information Theory

Regret to the best vs. regret to the average

Machine Learning
Better algorithms for benign bandits

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The follow perturbed leader algorithm protected from unbounded one-step losses

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Algorithm selection as a bandit problem with unbounded losses

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm

The Journal of Machine Learning Research
Better Algorithms for Benign Bandits

The Journal of Machine Learning Research
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

The Journal of Machine Learning Research
Adaptive and optimal online linear regression on l1-balls

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Competing against the best nearest neighbor filter in regression

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Regret minimization algorithms for pricing lookback options

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Pricing exotic derivatives using regret minimization

SAGT'11 Proceedings of the 4th international conference on Algorithmic game theory
Algorithm portfolio selection as a bandit problem with unbounded losses

Annals of Mathematics and Artificial Intelligence
Lower bounds on individual sequence regret

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Forecasting electricity consumption by aggregating specialized experts

Machine Learning
Online Multiple Kernel Classification

Machine Learning
Sparsity regret bounds for individual sequences in online linear regression

The Journal of Machine Learning Research
Adaptive and optimal online linear regression on ℓ1-balls

Theoretical Computer Science
Combining initial segments of lists

Theoretical Computer Science

Quantified Score

Hi-index	0.01

Visualization

Abstract

This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule. Our analysis has two main advantages: first, no preliminary knowledge about the payoff sequence is needed, not even its range; second, our bounds are expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds. In addition, our most refined bounds have the natural and desirable property of being stable under rescalings and general translations of the payoff sequence.