Communications of the ACM
Crytographic limitations on learning Boolean formulae and finite automata
STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
C4.5: programs for machine learning
C4.5: programs for machine learning
An introduction to computational learning theory
An introduction to computational learning theory
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Natural gradient works efficiently in learning
Neural Computation
On the boosting ability of top-down decision tree learning algorithms
Journal of Computer and System Sciences
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Linear hinge loss and average margin
Proceedings of the 1998 conference on Advances in neural information processing systems II
ECML '95 Proceedings of the 8th European Conference on Machine Learning
Logistic Regression, AdaBoost and Bregman Distances
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Information geometry of U-Boost and Bregman divergence
Neural Computation
Totally corrective boosting algorithms that maximize the margin
ICML '06 Proceedings of the 23rd international conference on Machine learning
Clustering with Bregman Divergences
The Journal of Machine Learning Research
A Real generalization of discrete AdaBoost
Artificial Intelligence
Information-theoretic metric learning
Proceedings of the 24th international conference on Machine learning
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Journal of Artificial Intelligence Research
Real boosting a la carte with an application to boosting oblique decision trees
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The p-norm generalization of the LMS algorithm for adaptive filtering
IEEE Transactions on Signal Processing
On the optimality of conditional expectation as a Bregman predictor
IEEE Transactions on Information Theory
Hi-index | 0.00 |
In a seminal paper, Amari (1998) proved that learning can be made more efficient when one uses the intrinsic Riemannian structure of the algorithms' spaces of parameters to point the gradient towards better solutions. In this paper, we show that many learning algorithms, including various boosting algorithms for linear separators, the most popular top-down decision-tree induction algorithms, and some on-line learning algorithms, are spawns of a generalization of Amari's natural gradient to some particular non-Riemannian spaces. These algorithms exploit an intrinsic dual geometric structure of the space of parameters in relationship with particular integral losses that are to be minimized. We unite some of them, such as AdaBoost, additive regression with the square loss, the logistic loss, the top-down induction performed in CART and C4.5, as a single algorithm on which we show general convergence to the optimum and explicit convergence rates under very weak assumptions. As a consequence, many of the classification calibrated surrogates of Bartlett et al. (2006) admit efficient minimization algorithms.