Intrinsic Geometries in Learning

Authors:
Richard Nock;Frank Nielsen
Affiliations:
CEREGMIA, Université Antilles-Guyane, Schoelcher, France;LIX, Ecole Polytechnique, Palaiseau, France and Sony Computer Science Laboratories Inc., Tokyo, Japan
Venue:
Emerging Trends in Visual Computing
Year:
2009

Citing 22
Cited 0

A theory of the learnable

Communications of the ACM
Crytographic limitations on learning Boolean formulae and finite automata

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
C4.5: programs for machine learning

C4.5: programs for machine learning
An introduction to computational learning theory

An introduction to computational learning theory
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Tracking the best regressor

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Natural gradient works efficiently in learning

Neural Computation
On the boosting ability of top-down decision tree learning algorithms

Journal of Computer and System Sciences
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Linear hinge loss and average margin

Proceedings of the 1998 conference on Advances in neural information processing systems II
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Logistic Regression, AdaBoost and Bregman Distances

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Information geometry of U-Boost and Bregman divergence

Neural Computation
Totally corrective boosting algorithms that maximize the margin

ICML '06 Proceedings of the 23rd international conference on Machine learning
Clustering with Bregman Divergences

The Journal of Machine Learning Research
A Real generalization of discrete AdaBoost

Artificial Intelligence
Information-theoretic metric learning

Proceedings of the 24th international conference on Machine learning
On Bregman Voronoi diagrams

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Inducing interpretable voting classifiers without trading accuracy for simplicity: theoretical results, approximation algorithms, and experiments

Journal of Artificial Intelligence Research
Real boosting a la carte with an application to boosting oblique decision trees

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The p-norm generalization of the LMS algorithm for adaptive filtering

IEEE Transactions on Signal Processing
On the optimality of conditional expectation as a Bregman predictor

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a seminal paper, Amari (1998) proved that learning can be made more efficient when one uses the intrinsic Riemannian structure of the algorithms' spaces of parameters to point the gradient towards better solutions. In this paper, we show that many learning algorithms, including various boosting algorithms for linear separators, the most popular top-down decision-tree induction algorithms, and some on-line learning algorithms, are spawns of a generalization of Amari's natural gradient to some particular non-Riemannian spaces. These algorithms exploit an intrinsic dual geometric structure of the space of parameters in relationship with particular integral losses that are to be minimized. We unite some of them, such as AdaBoost, additive regression with the square loss, the logistic loss, the top-down induction performed in CART and C4.5, as a single algorithm on which we show general convergence to the optimum and explicit convergence rates under very weak assumptions. As a consequence, many of the classification calibrated surrogates of Bartlett et al. (2006) admit efficient minimization algorithms.