The rate of convergence of AdaBoost

Authors:
Indraneel Mukherjee;Cynthia Rudin;Robert E. Schapire
Affiliations:
Princeton University, Department of Computer Science, Princeton, NJ;Massachusetts Institute of Technology, MIT Sloan School of Management, Cambridge, MA;Princeton University, Department of Computer Science, Princeton, NJ
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 12
Cited 0

On the convergence of the coordinate descent method for convex differentiable minimization

Journal of Optimization Theory and Applications
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Boosting in the limit: maximizing the margin of learned ensembles

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Prediction games and arcing algorithms

Neural Computation
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Soft Margins for AdaBoost

Machine Learning
Logistic Regression, AdaBoost and Bregman Distances

Machine Learning
Convex Optimization

Convex Optimization
The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins

The Journal of Machine Learning Research
Efficient Margin Maximizing with Boosting

The Journal of Machine Learning Research
Some Theory for Generalized Boosting Algorithms

The Journal of Machine Learning Research
AdaBoost is Consistent

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The AdaBoost algorithm was designed to combine many "weak" hypotheses that perform slightly better than random guessing into a "strong" hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the "exponential loss." Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Our first result shows that the exponential loss of AdaBoost's computed parameter vector will be at most e more than that of any parameter vector of l1-norm bounded by B in a number of rounds that is at most a polynomial in B and 1/ε. We also provide lower bounds showing that a polynomial dependence is necessary. Our second result is that within C/ε iterations, AdaBoost achieves a value of the exponential loss that is at most e more than the best possible value, where C depends on the data set. We show that this dependence of the rate on ε is optimal up to constant factors, that is, at least Ω(1/ε) rounds are necessary to achieve within ε of the optimal exponential loss.