C4.5: programs for machine learning
C4.5: programs for machine learning
Tree Induction for Probability-Based Ranking
Machine Learning
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
A New Performance Evaluation Method for Two-Class Imbalanced Problems
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
A fast computation of inter-class overlap measures using prototype reduction schemes
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
CODE: a data complexity framework for imbalanced datasets
PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Genetic algorithms as a pre processing strategy for imbalanced datasets
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Mining the hidden structure of inductive learning data sets
Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Pattern Recognition Letters
Hi-index | 0.00 |
Several studies have pointed out that class imbalance is a bottleneck in the performance achieved by standard supervised learning systems. However, a complete understanding of how this problem affects the performance of learning is still lacking. In previous work we identified that performance degradation is not solely caused by class imbalances, but is also related to the degree of class overlapping. In this work, we conduct our research a step further by investigating sampling strategies which aim to balance the training set. Our results show that these sampling strategies usually lead to a performance improvement for highly imbalanced data sets having highly overlapped classes. In addition, over-sampling methods seem to outperform under-sampling methods.