GA-Ensemble: a genetic algorithm for robust ensembles

Authors:
Dong-Yop Oh;J. Brian Gray
Affiliations:
Computer Information Systems and Quantitative Methods Department, The University of Texas, Pan American, Edinburg, USA 78539-2999;Department of Information Systems, Statistics and Management Science, The University of Alabama, Tuscaloosa, USA 35487-0226
Venue:
Computational Statistics
Year:
2013

Citing 9
Cited 0

The Strength of Weak Learnability

Machine Learning
Boosting a weak learning algorithm by majority

Information and Computation
Bagging predictors

Machine Learning
Boosting in the limit: maximizing the margin of learned ensembles

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Improved Generalization Through Explicit Optimization of Margins

Machine Learning
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Random Forests

Machine Learning
An Adaptive Version of the Boost by Majority Algorithm

Machine Learning
The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many simple and complex methods have been developed to solve the classification problem. Boosting is one of the best known techniques for improving the accuracy of classifiers. However, boosting is prone to overfitting with noisy data and the final model is difficult to interpret. Some boosting methods, including AdaBoost, are also very sensitive to outliers. In this article we propose a new method, GA-Ensemble, which directly solves for the set of weak classifiers and their associated weights using a genetic algorithm. The genetic algorithm utilizes a new penalized fitness function that limits the number of weak classifiers and controls the effects of outliers by maximizing an appropriately chosen $$p$$th percentile of margins. We compare the test set error rates of GA-Ensemble, AdaBoost, and GentleBoost (an outlier-resistant version of AdaBoost) using several artificial data sets and real-world data sets from the UC-Irvine Machine Learning Repository. GA-Ensemble is found to be more resistant to outliers and results in simpler predictive models than AdaBoost and GentleBoost.