A Hybrid Approach Handling Imbalanced Datasets

  • Authors:
  • Paolo Soda

  • Affiliations:
  • Integrated Research Centre, Medical Informatics & Computer Science Laboratory, University Campus Bio-Medico of Rome, Rome, Italy

  • Venue:
  • ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several binary classification problems exhibit imbalance in class distribution, influencing system learning. Indeed, traditional machine learning algorithms are biased towards the majority class, thus producing poor predictive accuracy over the minority one. To overcome this limitation, many approaches have been proposed up to now to build artificially balanced training sets. Further to their specific drawbacks, they achieve more balanced accuracies on each class harming the global accuracy. This paper first reviews the more recent method coping with imbalanced datasets and then proposes a strategy overcoming the main drawbacks of existing approaches. It is based on an ensemble of classifiers trained on balanced subsets of the original imbalanced training set working in conjunction with the classifier trained on the original imbalanced dataset. The performance of the method has been estimated on six public datasets, proving its effectiveness also in comparison with other approaches. It also gives the chance to modify the system behaviour according to the operating scenario.