Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data

Authors:
David A. Cieslak;Nitesh V. Chawla
Affiliations:
-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 3

An empirical study of applying ensembles of heterogeneous classifiers on imperfect data

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Borderline over-sampling for imbalanced data classification

International Journal of Knowledge Engineering and Soft Data Paradigms
Building decision trees for the multi-class imbalance problem

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Class imbalance is a ubiquitous problem in supervised learning and has gained wide-scale attention in the literature. Perhaps the most prevalent solution is to applysampling to training data in order improve classiﬁer performance. The typical approach will apply uniform levels of sampling globally. However, we believe that datais typically multi-modal, which suggests sampling shouldbe treated locally rather than globally. It is the purposeof this paper to propose a framework which ﬁrst identiﬁes meaningful regions of data and then proceeds to ﬁndoptimal sampling levels within each. This paper demonstrates that a global classiﬁer trained on data locally sampled produces superior rank-orderings on a wide range ofreal-world and artiﬁcial datasets as compared to contemporary global sampling methods.