Mistake bounds and logarithmic linear-threshold learning algorithms
Mistake bounds and logarithmic linear-threshold learning algorithms
A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
General convergence results for linear discriminant updates
COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Artificial Intelligence - Special issue on relevance
Competitive on-line linear regression
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The robustness of the p-norm algorithms
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Covering numbers for support vector machines
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Analysis of two gradient-based algorithms for on-line regression
Journal of Computer and System Sciences
Linear hinge loss and average margin
Proceedings of the 1998 conference on Advances in neural information processing systems II
Relative loss bounds for on-line density estimation with the exponential family of distributions
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
IEEE Transactions on Neural Networks
Relative loss bounds for single neurons
IEEE Transactions on Neural Networks
Logistic Regression, AdaBoost and Bregman Distances
Machine Learning
Potential-Based Algorithms in On-Line Prediction and Game Theory
Machine Learning
Online learning of linear classifiers
Advanced lectures on machine learning
Tracking the best linear predictor
The Journal of Machine Learning Research
Prior knowledge and preferential structures in gradient descent learning algorithms
The Journal of Machine Learning Research
The Robustness of the p-Norm Algorithms
Machine Learning
Solving large scale linear prediction problems using stochastic gradient descent algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A distributed learning framework for heterogeneous data sources
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging
Problems of Information Transmission
Worst-Case Analysis of Selective Sampling for Linear Classification
The Journal of Machine Learning Research
A primal-dual perspective of online learning algorithms
Machine Learning
Leading strategies in competitive on-line prediction
Theoretical Computer Science
Online Learning of Complex Prediction Problems Using Simultaneous Projections
The Journal of Machine Learning Research
Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
The Journal of Machine Learning Research
Adaptive fuzzy filtering in a deterministic setting
IEEE Transactions on Fuzzy Systems
Competitive online generalized linear regression under square loss
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
The Journal of Machine Learning Research
Linear Algorithms for Online Multitask Classification
The Journal of Machine Learning Research
Leading strategies in competitive on-line prediction
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Online learning meets optimization in the dual
COLT'06 Proceedings of the 19th annual conference on Learning Theory
COLT'06 Proceedings of the 19th annual conference on Learning Theory
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Online Learning and Online Convex Optimization
Foundations and Trends® in Machine Learning
Regularization techniques for learning with matrices
The Journal of Machine Learning Research
Optimized online rank learning for machine translation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Dimensionality reduction with generalized linear models
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
We study on-line generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax function that need to consider the linear activations to all the output neurons. The weight vectors used to produce the linear activations are represented indirectly by maintaining separate parameter vectors. We get the weight vector by applying a particular parameterization function to the parameter vector. Updating the parameter vectors upon seeing new examples is done additively, as in the usual gradient descent update. However, by using a nonlinear parameterization function between the parameter vectors and the weight vectors, we can make the resulting update of the weight vector quite different from a true gradient descent update. To analyse such updates, we define a notion of a matching loss function and apply it both to the transfer function and to the parameterization function. The loss function that matches the transfer function is used to measure the goodness of the predictions of the algorithm. The loss function that matches the parameterization function can be used both as a measure of divergence between models in motivating the update rule of the algorithm and as a measure of progress in analyzing its relative performance compared to an arbitrary fixed model. As a result, we have a unified treatment that generalizes earlier results for the gradient descent and exponentiated gradient algorithms to multidimensional outputs, including multiclass logistic regression.