Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Concept-Learning in the Presence of Between-Class and Within-Class Imbalances
AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
A Data Complexity Analysis on Imbalanced Datasets and an Alternative Imbalance Recovering Strategy
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Balancing strategies and class overlapping
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
A new evaluation measure for imbalanced datasets
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Hi-index | 0.00 |
Imbalanced datasets occur in many domains, such as fraud detection, cancer detection and web; and in such domains, the class of interest often concerns the rare occurring events. Thus it is important to have a good performance on these classes while maintaining a reasonable overall accuracy. Although imbalanced datasets can be difficult to learn, but in the previous researches, the skewed class distribution has been suggested to not necessarily being the one that poses problems for learning. Therefore, when the learning of the rare class becomes problematic, it does not imply that the skewed class distribution is the cause to blame, but rather that the imbalanced distribution may just be a byproduct of some other hidden intrinsic difficulties. This paper tries to shade some light on this issue of learning from imbalanced dataset. We propose to use data complexity models to profile datasets in order to make connections with imbalanced datasets; this can potentially lead to better learning approaches. We have extended from our previous work with an improved implementation of the CODE framework in order to tackle a more difficult learning challenge. Despite the increased difficulty, CODE still enables a reasonable performance on profiling the data complexity of imbalanced datasets.