Out-of-bag estimation of the optimal sample size in bagging

  • Authors:
  • Gonzalo Martínez-Muñoz;Alberto Suárez

  • Affiliations:
  • C/Francisco Tomás y Valiente, 11 Escuela Politécnica Superior, Universidad Autónoma de Madrid, Madrid 28049, Spain;C/Francisco Tomás y Valiente, 11 Escuela Politécnica Superior, Universidad Autónoma de Madrid, Madrid 28049, Spain

  • Venue:
  • Pattern Recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

The performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set m"w"o"r=n. Without-replacement methods typically use half samples m"w"r=n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of the classification performance of the ensemble. We propose to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio. Ensembles of classifiers trained on independent samples whose size is such that the out-of-bag error of the ensemble is as low as possible generally improve the performance of standard bagging and can be efficiently built.