Identification of different types of minority class examples in imbalanced data

  • Authors:
  • Krystyna Napierala;Jerzy Stefanowski

  • Affiliations:
  • Institute of Computing Sciences, Poznań University of Technology, Poznań, Poland;Institute of Computing Sciences, Poznań University of Technology, Poznań, Poland

  • Venue:
  • HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The characteristics of the minority class distribution in imbalanced data is studied. Four types of minority examples --- safe, borderline, rare and outlier --- are distinguished and analysed. We propose a new method for identification of these examples in the data, based on analysing the local neighbourhoods of examples. Its application to UCI imbalanced datasets shows that the minority class is often scattered without too many safe examples. This characteristics of data distributions is also confirmed by another analysis with Multidimensional Scaling visualization. We examine the influence of these types of examples on 6 different classifiers learned over various real-world datasets. Results of experiments show that the particular classifiers reveal different sensitivity to the type of examples.