Addressing the classification with imbalanced data: open problems and new challenges on class distribution

  • Authors:
  • A. Fernández;S. García;F. Herrera

  • Affiliations:
  • Dept. of Computer Science, University of Jaén;Dept. of Computer Science and A.I., University of Granada;Dept. of Computer Science, University of Jaén

  • Venue:
  • HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classifier learning with datasets which suffer from imbalanced class distributions is an important problem in data mining. This issue occurs when the number of examples representing one class is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. The aim of this work is to shortly review the main issues of this problem and to describe two common approaches for dealing with imbalance, namely sampling and cost sensitive learning. Additionally, we will pay special attention to some open problems, in particular we will carry out a discussion on the data intrinsic characteristics of the imbalanced classification problem which will help to follow new paths that can lead to the improvement of current models, namely size of the dataset, small disjuncts, the overlapping between the classes and the data fracture between training and test distribution.