Boosting as a Regularized Path to a Maximum Margin Classifier

Authors:
Saharon Rosset;Ji Zhu;Trevor Hastie
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2004

Citing 6
Cited 21

Prediction games and arcing algorithms

Neural Computation
Soft Margins for AdaBoost

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Logistic Regression, AdaBoost and Bregman Distances

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Sequential greedy approximation for certain convex optimization problems

IEEE Transactions on Information Theory
Arbitrary-norm separating planefn1fn1This material is based on research supported by NSF Grant CCR-9322479 and AFOSR Grant F49620-97-1-0326.

Operations Research Letters

The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins

The Journal of Machine Learning Research
Robust boosting and its relation to bagging

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Two-dimensional solution path for support vector regression

ICML '06 Proceedings of the 23rd international conference on Machine learning
Nonparametric Modeling of Neural Point Processes via Stochastic Gradient Boosting Regression

Neural Computation
Nonlinear Boosting Projections for Ensemble Construction

The Journal of Machine Learning Research
Boosted Classification Trees and Class Probability/Quantile Estimation

The Journal of Machine Learning Research
Stagewise Lasso

The Journal of Machine Learning Research
Support Vector Machinery for Infinite Ensemble Learning

The Journal of Machine Learning Research
Supervised projection approach for boosting classifiers

Pattern Recognition
Learning to search: Functional gradient techniques for imitation learning

Autonomous Robots
Model Selection: Beyond the Bayesian/Frequentist Divide

The Journal of Machine Learning Research
l1 regularization in infinite dimensional feature spaces

COLT'07 Proceedings of the 20th annual conference on Learning theory
Multi-task learning for boosting with application to web search ranking

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning with ensembles of randomized trees: new insights

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Boosting part-sense multi-feature learners toward effective object detection

Computer Vision and Image Understanding
Bounded coordinate-descent for biological sequence classification in high dimensional predictor space

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Structured Variable Selection with Sparsity-Inducing Norms

The Journal of Machine Learning Research
Infinite ensemble learning with support vector machines

ECML'05 Proceedings of the 16th European conference on Machine Learning
Positive semidefinite metric learning using boosting-like algorithms

The Journal of Machine Learning Research
Multi-scale patch based collaborative representation for face recognition with margin distribution optimization

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Fully corrective boosting with arbitrary loss and regularization

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector. This helps understand the success of boosting with early stopping as regularized fitting of the loss criterion. For the two most commonly used criteria (exponential and binomial log-likelihood), we further show that as the constraint is relaxed---or equivalently as the boosting iterations proceed---the solution converges (in the separable case) to an "l1-optimal" separating hyper-plane. We prove that this l1-optimal separating hyper-plane has the property of maximizing the minimal l1-margin of the training data, as defined in the boosting literature. An interesting fundamental similarity between boosting and kernel support vector machines emerges, as both can be described as methods for regularized optimization in high-dimensional predictor space, using a computational trick to make the calculation practical, and converging to margin-maximizing solutions. While this statement describes SVMs exactly, it applies to boosting only approximately.