GA-Ensemble: a genetic algorithm for robust ensembles

  • Authors:
  • Dong-Yop Oh;J. Brian Gray

  • Affiliations:
  • Computer Information Systems and Quantitative Methods Department, The University of Texas, Pan American, Edinburg, USA 78539-2999;Department of Information Systems, Statistics and Management Science, The University of Alabama, Tuscaloosa, USA 35487-0226

  • Venue:
  • Computational Statistics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many simple and complex methods have been developed to solve the classification problem. Boosting is one of the best known techniques for improving the accuracy of classifiers. However, boosting is prone to overfitting with noisy data and the final model is difficult to interpret. Some boosting methods, including AdaBoost, are also very sensitive to outliers. In this article we propose a new method, GA-Ensemble, which directly solves for the set of weak classifiers and their associated weights using a genetic algorithm. The genetic algorithm utilizes a new penalized fitness function that limits the number of weak classifiers and controls the effects of outliers by maximizing an appropriately chosen $$p$$th percentile of margins. We compare the test set error rates of GA-Ensemble, AdaBoost, and GentleBoost (an outlier-resistant version of AdaBoost) using several artificial data sets and real-world data sets from the UC-Irvine Machine Learning Repository. GA-Ensemble is found to be more resistant to outliers and results in simpler predictive models than AdaBoost and GentleBoost.