Out-of-bag estimation of the optimal sample size in bagging

Authors:
Gonzalo Martínez-Muñoz;Alberto Suárez
Affiliations:
C/Francisco Tomás y Valiente, 11 Escuela Politécnica Superior, Universidad Autónoma de Madrid, Madrid 28049, Spain;C/Francisco Tomás y Valiente, 11 Escuela Politécnica Superior, Universidad Autónoma de Madrid, Madrid 28049, Spain
Venue:
Pattern Recognition
Year:
2010

Citing 8
Cited 7

Bagging predictors

Machine Learning
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
MultiBoosting: A Technique for Combining Boosting and Wagging

Machine Learning
Pasting Small Votes for Classification in Large Databases and On-Line

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
An empirical comparison of supervised learning algorithms

ICML '06 Proceedings of the 23rd international conference on Machine learning
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Parallel Approach for Ensemble Learning with Locally Coupled Neural Networks

Neural Processing Letters
Small-sample error estimation for bagged classification rules

EURASIP Journal on Advances in Signal Processing - Special issue on genomic signal processing
Investigation of bagging ensembles of genetic neural networks and fuzzy systems for real estate appraisal

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
Empirical comparison of resampling methods using genetic fuzzy systems for a regression problem

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
An experimental study of one- and two-level classifier fusion for different sample sizes

Pattern Recognition Letters
Predicting customer profitability during acquisition: Finding the optimal combination of data source and data mining technique

Expert Systems with Applications: An International Journal
How large should ensembles of classifiers be?

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

The performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set m"w"o"r=n. Without-replacement methods typically use half samples m"w"r=n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of the classification performance of the ensemble. We propose to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio. Ensembles of classifiers trained on independent samples whose size is such that the out-of-bag error of the ensemble is as low as possible generally improve the performance of standard bagging and can be efficiently built.