The Strength of Weak Learnability
Machine Learning
On the convergence of the coordinate descent method for convex differentiable minimization
Journal of Optimization Theory and Applications
Boosting a weak learning algorithm by majority
Information and Computation
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Boosting as entropy projection
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Prediction games and arcing algorithms
Neural Computation
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Logistic Regression, AdaBoost and Bregman Distances
Machine Learning
Boosting the margin: A new explanation for the effectiveness of voting methods
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Maximizing the Margin with Boosting
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Hard-core distributions for somewhat hard problems
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Convex Optimization
Some Theory for Generalized Boosting Algorithms
The Journal of Machine Learning Research
Boosting: Foundations and Algorithms
Boosting: Foundations and Algorithms
Hi-index | 0.00 |
Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε 0, O(ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O(ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O(1/ε), with a matching lower bound provided for the logistic loss.