Identification of different types of minority class examples in imbalanced data

Authors:
Krystyna Napierala;Jerzy Stefanowski
Affiliations:
Institute of Computing Sciences, Poznań University of Technology, Poznań, Poland;Institute of Computing Sciences, Poznań University of Technology, Poznań, Poland
Venue:
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Year:
2012

Citing 6
Cited 2

Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Learning from imbalanced data in presence of noisy and borderline examples

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Addressing the classification with imbalanced data: open problems and new challenges on class distribution

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I

Class imbalance and the curse of minority hubs

Knowledge-Based Systems
Cost-sensitive decision tree ensembles for effective imbalanced classification

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The characteristics of the minority class distribution in imbalanced data is studied. Four types of minority examples --- safe, borderline, rare and outlier --- are distinguished and analysed. We propose a new method for identification of these examples in the data, based on analysing the local neighbourhoods of examples. Its application to UCI imbalanced datasets shows that the minority class is often scattered without too many safe examples. This characteristics of data distributions is also confirmed by another analysis with Multidimensional Scaling visualization. We examine the influence of these types of examples on 6 different classifiers learned over various real-world datasets. Results of experiments show that the particular classifiers reveal different sensitivity to the type of examples.