An empirical comparison of three boosting algorithms on real data sets with artificial class noise

  • Authors:
  • Ross A. McDonald;David J. Hand;Idris A. Eckley

  • Affiliations:
  • Imperial College London;Imperial College London;Shell Research Ltd.

  • Venue:
  • MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Boosting algorithms are a means of building a strong ensemble classifier by aggregating a sequence of weak hypotheses. In this paper we consider three of the best-known boosting algorithms: Adaboost [9], Logitboost [11] and Brownboost [8]. These algorithms are adaptive, and work by maintaining a set of example and class weights which focus the attention of a base learner on the examples that are hardest to classify. We conduct an empirical study to compare the performance of these algorithms, measured in terms of overall test error rate, on five real data sets. The tests consist of a series of cross-validatory samples. At each validation, we set aside one third of the data chosen at random as a test set, and fit the boosting algorithm to the remaining two thirds, using binary stumps as a base learner. At each stage we record the final training and test error rates, and report the average errors within a 95% confidence interval. We then add artificial class noise to our data sets by randomly reassigning 20% of class labels, and repeat our experiment. We find that Brownboost and Logitboost prove less likely than Adaboost to overfit in this circumstance.