Prior knowledge and preferential structures in gradient descent learning algorithms

Authors:
Robert E. Mahony;Robert C. Williamson
Affiliations:
Department of Engineering, Australian National University, Canberra, ACT 0200, Australia;Department of Telecommunications Engineering, Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia.
Venue:
The Journal of Machine Learning Research
Year:
2001

Citing 14
Cited 2

Adaptive signal processing algorithms: stability and performance

Adaptive signal processing algorithms: stability and performance
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
General convergence results for linear discriminant updates

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Natural gradient works efficiently in learning

Neural Computation
Relative loss bounds for multidimensional regression problems

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The robustness of the p-norm algorithms

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Regret bounds for prediction problems

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Relative Loss Bounds for Multidimensional Regression Problems

Machine Learning
Fundamentals of Artificial Neural Networks

Fundamentals of Artificial Neural Networks
Optimal and Adaptive Signal Processing

Optimal and Adaptive Signal Processing
Feedforward Neural Network Methodology

Feedforward Neural Network Methodology
Approximate solutions to markov decision processes

Approximate solutions to markov decision processes
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Convergence of exponentiated gradient algorithms

IEEE Transactions on Signal Processing

Krylov-proportionate adaptive filtering techniques not limited to sparse systems

IEEE Transactions on Signal Processing
Expert mixture methods for adaptive channel equalization

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A family of gradient descent algorithms for learning linear functions in an online setting is considered. The family includes the classical LMS algorithm as well as new variants such as the Exponentiated Gradient (EG) algorithm due to Kivinen and Warmuth. The algorithms are based on prior distributions defined on the weight space. Techniques from differential geometry are used to develop the algorithms as gradient descent iterations with respect to the natural gradient in the Riemannian structure induced by the prior distribution. The proposed framework subsumes the notion of "link-functions".