The impact of random samples in ensemble classifiers

  • Authors:
  • Paulo Fernandes;Lucelene Lopes;Duncan D. A. Ruiz

  • Affiliations:
  • PPGCC - FACIN -PUCRS, Porto Alegre, Brazil;PPGCC - FACIN -PUCRS, Porto Alegre, Brazil;PPGCC - FACIN -PUCRS, Porto Alegre, Brazil

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of ensemble classifiers, e.g., Bagging and Boosting, is wide spread to machine learning. However, most of studies in this area are based on empirical comparisons that suffer from a lack of care to the randomness of these methods. This paper describes the dangers of experiments with ensemble classifiers by analyzing the efficiency of Bagging and Boosting methods over 32 different data sets. The experiments show that variations due to randomness are often more relevant than the advantages among methods encountered in the literature. This paper main contribution is the claim, supported by statistical analysis, that no empirical comparison of ensemble classifiers can be scientifically done without paying attention to the random choices taken.