Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
MultiBoosting: A Technique for Combining Boosting and Wagging
Machine Learning
Machine Learning
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Exploratory Under-Sampling for Class-Imbalance Learning
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
The class imbalance problem: A systematic study
Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
No consistent conclusions have been drawn from existing studies regarding the effectiveness of different approaches to learning from imbalanced data. In this paper we apply bias-variance analysis to study the utility of different strategies for imbalanced learning. We conduct experiments on 15 real-world imbalanced datasets of applying various re-sampling and induction bias adjustment strategies to the standard decision tree, naive bayes and k-nearest neighbour (k-NN) learning algorithms. Our main findings include: Imbalanced class distribution is primarily a high bias problem, which partly explains why it impedes the performance of many standard learning algorithms. Compared to the re-sampling strategies, adjusting induction bias can more significantly vary the bias and variance components of classification errors. Especially the inverse distance weighting strategy can significantly reduce the variance errors for k-NN. Based on these findings we offer practical advice on applying the re-sampling and induction bias adjustment strategies to improve imbalanced learning.