An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

  • Authors:
  • Eric Bauer;Ron Kohavi

  • Affiliations:
  • Computer Science Department, Stanford University, Stanford CA 94305. ebauer@cs.stanford.edu;Blue Martini Software, 2600 Campus Dr. Suite 175, San Matis, CA 94403. ronnyk@cs.stanford.edu

  • Venue:
  • Machine Learning
  • Year:
  • 1999

Quantified Score

Hi-index 0.04

Visualization

Abstract

Methods for voting classification algorithms, such as Baggingand AdaBoost, have been shown to be very successful in improving theaccuracy of certain classifiers for artificial and real-worlddatasets. We review these algorithms and describe a large empiricalstudy comparing several variants in conjunction with a decision treeinducer (three variants) and a Naive-Bayes inducer. The purpose ofthe study is to improve our understanding of why and when thesealgorithms, which use perturbation, reweighting, and combinationtechniques, affect classification error. We provide a bias andvariance decomposition of the error to show how different methods andvariants influence these two terms. This allowed us to determinethat Bagging reduced variance of unstable methods, while boostingmethods (AdaBoost and Arc-x4) reduced both the bias and variance ofunstable methods but increased the variance for Naive-Bayes, which was verystable. We observed that Arc-x4 behaves differently than AdaBoost ifreweighting is used instead of resampling, indicating a fundamentaldifference. Voting variants, some of which are introduced in thispaper, include: pruning versus no pruning, use of probabilisticestimates, weight perturbations (Wagging), and backfitting of data.We found that Bagging improves when probabilistic estimates inconjunction with no-pruning are used, as well as when the data wasbackfit. We measure tree sizes and show an interesting positivecorrelation between the increase in the average tree size in AdaBoosttrials and its success in reducing the error. We compare themean-squared error of voting methods to non-voting methods and showthat the voting methods lead to large and significant reductions inthe mean-squared errors. Practical problems that arise inimplementing boosting algorithms are explored, including numericalinstabilities and underflows. We use scatterplots that graphicallyshow how AdaBoost reweights instances, emphasizing not only “hard”areas but also outliers and noise.