Relative Loss Bounds for Multidimensional Regression Problems

Authors:
J. Kivinen;M. K. Warmuth
Affiliations:
Department of Engineering, Australian National University, Canberra, ACT 0200, Australia. jyrki.kivinen@faceng.anu.edu.au;Department of Computer Science, University of California, Santa Cruz, CA 95064, USA. manfred@cse.ucsc.edu
Venue:
Machine Learning
Year:
2001

Citing 13
Cited 28

Mistake bounds and logarithmic linear-threshold learning algorithms

Mistake bounds and logarithmic linear-threshold learning algorithms
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
General convergence results for linear discriminant updates

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant

Artificial Intelligence - Special issue on relevance
Competitive on-line linear regression

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The robustness of the p-norm algorithms

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Covering numbers for support vector machines

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Analysis of two gradient-based algorithms for on-line regression

Journal of Computer and System Sciences
Linear hinge loss and average margin

Proceedings of the 1998 conference on Advances in neural information processing systems II
Relative loss bounds for on-line density estimation with the exponential family of distributions

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Worst-case quadratic loss bounds for prediction using linear functions and gradient descent

IEEE Transactions on Neural Networks
Relative loss bounds for single neurons

IEEE Transactions on Neural Networks

Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Machine Learning
Logistic Regression, AdaBoost and Bregman Distances

Machine Learning
Potential-Based Algorithms in On-Line Prediction and Game Theory

Machine Learning
Online learning of linear classifiers

Advanced lectures on machine learning
Tracking the best linear predictor

The Journal of Machine Learning Research
Prior knowledge and preferential structures in gradient descent learning algorithms

The Journal of Machine Learning Research
The Robustness of the p-Norm Algorithms

Machine Learning
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A distributed learning framework for heterogeneous data sources

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

Problems of Information Transmission
Worst-Case Analysis of Selective Sampling for Linear Classification

The Journal of Machine Learning Research
A primal-dual perspective of online learning algorithms

Machine Learning
Leading strategies in competitive on-line prediction

Theoretical Computer Science
Online Learning of Complex Prediction Problems Using Simultaneous Projections

The Journal of Machine Learning Research
Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

The Journal of Machine Learning Research
Adaptive fuzzy filtering in a deterministic setting

IEEE Transactions on Fuzzy Systems
A new convex objective function for the supervised learning of single-layer neural networks

Pattern Recognition
Competitive online generalized linear regression under square loss

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Composite Binary Losses

The Journal of Machine Learning Research
Linear Algorithms for Online Multitask Classification

The Journal of Machine Learning Research
Leading strategies in competitive on-line prediction

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Online learning meets optimization in the dual

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Online multitask learning

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Leaving the span

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Online Learning and Online Convex Optimization

Foundations and Trends® in Machine Learning
Regularization techniques for learning with matrices

The Journal of Machine Learning Research
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Dimensionality reduction with generalized linear models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study on-line generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax function that need to consider the linear activations to all the output neurons. The weight vectors used to produce the linear activations are represented indirectly by maintaining separate parameter vectors. We get the weight vector by applying a particular parameterization function to the parameter vector. Updating the parameter vectors upon seeing new examples is done additively, as in the usual gradient descent update. However, by using a nonlinear parameterization function between the parameter vectors and the weight vectors, we can make the resulting update of the weight vector quite different from a true gradient descent update. To analyse such updates, we define a notion of a matching loss function and apply it both to the transfer function and to the parameterization function. The loss function that matches the transfer function is used to measure the goodness of the predictions of the algorithm. The loss function that matches the parameterization function can be used both as a measure of divergence between models in motivating the update rule of the algorithm and as a measure of progress in analyzing its relative performance compared to an arbitrary fixed model. As a result, we have a unified treatment that generalizes earlier results for the gradient descent and exponentiated gradient algorithms to multidimensional outputs, including multiclass logistic regression.