EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS

Authors:
Jyrki Kivinen;Manfred Warmuth
Affiliations:
-;-
Venue:
EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS
Year:
1994

Citing 0
Cited 18

A game of prediction with expert advice

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Universal portfolios with and without transaction costs

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Minimax relative loss analysis for sequential prediction algorithms using parametric hypotheses

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A new on-line learning algorithm for adaptive text filtering

Proceedings of the seventh international conference on Information and knowledge management
Learning while filtering documents

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
An intelligent adaptive filtering agent based on an on-line learning model (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Universal Portfolios With and Without Transaction Costs

Machine Learning - Special issue: computational learning theory, COLT '97
Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Genetic Control Applied to Asset Managements

EuroGP '02 Proceedings of the 5th European Conference on Genetic Programming
Extended Stochastic Complexity and Minimax Relative Loss Analysis

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Learning Intermediate Concepts

ALT '01 Proceedings of the 12th International Conference on Algorithmic Learning Theory
S Parameter-Based Experimental Modeling of High Q MCM Inductor with Exponential Gradient Learning Algorithm

MCMC '97 Proceedings of the 1997 Conference on IEEE Multi-Chip Module Conference
Training Global Linear Models for Chinese Word Segmentation

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Worst-case absolute loss bounds for linear learning algorithms

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Update rules for parameter estimation in Bayesian networks

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider two algorithm for on-line prediction based on a linear model. The algorithms are the well-known Gradient Descent (GD) algorithm and a new algorithm, which we call EG(+/-). They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG(+/-) algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively. We present worst-case loss bounds for EG(+/-) and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions. We have performed experiments, which show that our worst-case upper bounds are quite tight already on simple artificial data.