Evolutionary Training Set Selection to Optimize C4.5 in Imbalanced Problems

  • Authors:
  • Salvador García;Francisco Herrera

  • Affiliations:
  • -;-

  • Venue:
  • HIS '08 Proceedings of the 2008 8th International Conference on Hybrid Intelligent Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification in imbalanced domains is a recent challenge in machine learning. We refer to imbalanced classification when data presents many examples from one class and few from the other class, and the less representative class is the one which has more interest. One of the most used techniques to tackle this problem consists in preprocessing the data previously to the learning process. This preprocessing could be done through under-sampling; removing examples, mainly belonging to the majority class; and over-sampling, by means of replicating or generating new minority examples. This contribution proposes an under-sampling procedure based on evolutionary algorithms to perform a training set selection for optimizing the models obtained by the C4.5 decision tree. The proposal has been compared with other under-sampling and over-sampling techniques and the results are very competitive in terms of accuracy, and the obtained models are more interpretable.