On the rate of convergence of regularized boosting classifiers

Authors:
Gilles Blanchard;G´bor Lugosi;Nicolas Vayatis
Affiliations:
CNRS Laboratoire de Mathématiques, Université Paris-Sud, Bâtiment 425, 91405 Orsay Cedex, France;Department of Economics, Pompeu Fabra University, Ramon Trias Fargas 25-27, 08005 Barcelona, Spain;Université Paris 6-Pierre et Marie Curie, Laboratoire de Probabilités et Modèles Aléatoires, 4, place Jussieu - Boite courrier 188, 75252 Paris cedex 05, France
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 15
Cited 13

Multilayer feedforward networks are universal approximators

Neural Networks
The Strength of Weak Learnability

Machine Learning
Boosting a weak learning algorithm by majority

Information and Computation
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
On the approximation of functional classes equipped with a uniform measure using ridge functions

Journal of Approximation Theory
Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

Machine Learning
Logistic Regression, AdaBoost and Bregman Distances

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
The Consistency of Greedy Algorithms for Classification

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Localized Rademacher Complexities

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
An introduction to boosting and leveraging

Advanced lectures on machine learning
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Minimax nonparametric classification .I. Rates of convergence

IEEE Transactions on Information Theory
Minimax nonparametric classification. II. Model selection for adaptation

IEEE Transactions on Information Theory
Improving the sample complexity using global data

IEEE Transactions on Information Theory
On the optimality of neural-network approximation using incremental algorithms

IEEE Transactions on Neural Networks

On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition

The Journal of Machine Learning Research
Statistical Analysis of Some Multi-Category Large Margin Classification Methods

The Journal of Machine Learning Research
Boosting with Noisy Data: Some Views from Statistical Theory

Neural Computation
Multi-kernel regularized classifiers

Journal of Complexity
Boosted Classification Trees and Class Probability/Quantile Estimation

The Journal of Machine Learning Research
Learning with sample dependent hypothesis spaces

Computers & Mathematics with Applications
Boosting and instability for regression trees

Computational Statistics & Data Analysis
Deformation of log-likelihood loss function for multiclass boosting

Neural Networks
Optimal oracle inequality for aggregation of classifiers under low noise condition

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Subset ranking using regression

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Ranking and scoring using empirical risk minimization

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Exponential convergence rates in classification

COLT'05 Proceedings of the 18th annual conference on Learning Theory
On ranking and generalization bounds

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite a large class of distributions, the probability of error converges to the Bayes risk at a rate faster than n-(V+2)/(4(V+1)) where V is the VC dimension of the "base" class whose elements are combined by boosting methods to obtain an aggregated classifier. The dimension-independent nature of the rates may partially explain the good behavior of these methods in practical problems. Under Tsybakov's noise condition the rate of convergence is even faster. We investigate the conditions necessary to obtain such rates for different base classes. The special case of boosting using decision stumps is studied in detail. We characterize the class of classifiers realizable by aggregating decision stumps. It is shown that some versions of boosting work especially well in high-dimensional logistic additive models. It appears that adding a limited labelling noise to the training data may in certain cases improve the convergence, as has been also suggested by other authors.