C4.5 consolidation process: an alternative to intelligent oversampling methods in class imbalance problems

  • Authors:
  • Iñaki Albisua;Olatz Arbelaitz;Ibai Gurrutxaga;Javier Muguerza;Jesús M. Pérez

  • Affiliations:
  • Computer Science Faculty, University of the Basque Country, Donostia, Spain;Computer Science Faculty, University of the Basque Country, Donostia, Spain;Computer Science Faculty, University of the Basque Country, Donostia, Spain;Computer Science Faculty, University of the Basque Country, Donostia, Spain;Computer Science Faculty, University of the Basque Country, Donostia, Spain

  • Venue:
  • CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In real world problems solved using data mining techniques, it is very usual to find data in which the number of examples of one of the classes is much smaller than the number of examples of the rest of the classes. Many works have been done to deal with these problems known as class imbalance problems. Most of them focus their effort on data resampling techniques so that training data would be improved, usually balancing the classes, before using a classical learning algorithm. Another option is to propose modifications to the learning algorithm. As a mixture of these two options, we proposed the Consolidation process, based on a previous resampling of the training data and a modification of the learning algorithm, in this study the C4.5. In this work, we experimented with 14 databases and compared the effectiveness of each strategy based on the achieved AUC values. Results show that the consolidation obtains the best performance compared to five well-known resampling methods including SMOTE and some of its variants. Thus, the consolidation process combined with subsamples to balance the class distribution is appropriate for class imbalance problems requiring explanation and high discriminating capacity.