On the convergence of the coordinate descent method for convex differentiable minimization
Journal of Optimization Theory and Applications
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Boosting in the limit: maximizing the margin of learned ensembles
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Prediction games and arcing algorithms
Neural Computation
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Machine Learning
Logistic Regression, AdaBoost and Bregman Distances
Machine Learning
Convex Optimization
The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins
The Journal of Machine Learning Research
Efficient Margin Maximizing with Boosting
The Journal of Machine Learning Research
Some Theory for Generalized Boosting Algorithms
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Hi-index | 0.00 |
The AdaBoost algorithm was designed to combine many "weak" hypotheses that perform slightly better than random guessing into a "strong" hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the "exponential loss." Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Our first result shows that the exponential loss of AdaBoost's computed parameter vector will be at most e more than that of any parameter vector of l1-norm bounded by B in a number of rounds that is at most a polynomial in B and 1/ε. We also provide lower bounds showing that a polynomial dependence is necessary. Our second result is that within C/ε iterations, AdaBoost achieves a value of the exponential loss that is at most e more than the best possible value, where C depends on the data set. We show that this dependence of the rate on ε is optimal up to constant factors, that is, at least Ω(1/ε) rounds are necessary to achieve within ε of the optimal exponential loss.