Worst-case quadratic loss bounds for prediction using linear functions and gradient descent

Authors:
N. Cesa-Bianchi;P. M. Long;M. K. Warmuth
Affiliations:
Dipartimento di Sci. dell'Inf., Milan Univ.;-;-
Venue:
IEEE Transactions on Neural Networks
Year:
1996

Citing 0
Cited 36

On-line evaluation and prediction using linear functions

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Analysis of two gradient-based algorithms for on-line regression

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
The binary exponentiated gradient algorithm for learning linear functions

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Tracking the best regressor

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Minimax relative loss analysis for sequential prediction algorithms using parametric hypotheses

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The robustness of the p-norm algorithms

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Probability theory for the Brier game

Theoretical Computer Science
Relative Loss Bounds for Multidimensional Regression Problems

Machine Learning
Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Machine Learning
Relative Loss Bounds for Temporal-Difference Learning

Machine Learning
Learning Intermediate Concepts

ALT '01 Proceedings of the 12th International Conference on Algorithmic Learning Theory
A Sequential Approximation Bound for Some Sample-Dependent Convex Optimization Problems with Applications in Learning

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Learning Additive Models Online with Fast Evaluating Kernels

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Multiple-Instance Learning of Real-Valued Geometric Patterns

Annals of Mathematics and Artificial Intelligence
Tracking the best linear predictor

The Journal of Machine Learning Research
The Robustness of the p-Norm Algorithms

Machine Learning
Competing with wild prediction rules

Machine Learning
Air quality modeling: From deterministic to stochastic approaches

Computers & Mathematics with Applications
A worst-case comparison between temporal difference and residual gradient with linear function approximation

Proceedings of the 25th international conference on Machine learning
Leading strategies in competitive on-line prediction

Theoretical Computer Science
Aggregating Algorithm for a Space of Analytic Functions

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Learning rates of gradient descent algorithm for classification

Journal of Computational and Applied Mathematics
Limited stochastic meta-descent for kernel-based online learning

Neural Computation
Incomplete tree search using adaptive probing

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Adaptive fuzzy filtering in a deterministic setting

IEEE Transactions on Fuzzy Systems
Online Learning with Samples Drawn from Non-identical Distributions

The Journal of Machine Learning Research
Competing with stationary prediction strategies

COLT'07 Proceedings of the 20th annual conference on Learning theory
Steady-state MSE performance analysis of mixture approaches to adaptive filtering

IEEE Transactions on Signal Processing
Worst-case absolute loss bounds for linear learning algorithms

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
An identity for kernel ridge regression

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Adaptive and optimal online linear regression on l1-balls

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Relative loss bounds for on-line density estimation with the exponential family of distributions

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Leading strategies in competitive on-line prediction

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Competing with wild prediction rules

COLT'06 Proceedings of the 19th annual conference on Learning Theory
On-Line regression competitive with reproducing kernel hilbert spaces

TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
An identity for kernel ridge regression

Theoretical Computer Science

Quantified Score

Hi-index	0.01

Visualization

Abstract

Studies the performance of gradient descent (GD) when applied to the problem of online linear prediction in arbitrary inner product spaces. We prove worst-case bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about the sequence to predict. The algorithms we use are variants and extensions of online GD. Whereas our algorithms always predict using linear functions as hypotheses, none of our results requires the data to be linearly related. In fact, the bounds proved on the total prediction loss are typically expressed as a function of the total loss of the best fixed linear predictor with bounded norm. All the upper bounds are tight to within constants. Matching lower bounds are provided in some cases. Finally, we apply our results to the problem of online prediction for classes of smooth functions