C4.5: programs for machine learning
C4.5: programs for machine learning
Robust classification systems for imprecise environments
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The base-rate fallacy and its implications for the difficulty of intrusion detection
CCS '99 Proceedings of the 6th ACM conference on Computer and communications security
Explicitly representing expected cost: an alternative to ROC representation
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery
Improving Minority Class Prediction Using Case-Specific Feature Weights
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
AUC: a statistically consistent and more discriminating measure than accuracy
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Cost-Based Sampling of Individual Instances
Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Cost-sensitive classifier evaluation using cost curves
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Classifying severely imbalanced data
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Evaluating misclassifications in imbalanced data
ECML'06 Proceedings of the 17th European conference on Machine Learning
ANN vs. SVM: Which one performs better in classification of MCCs in mammogram imaging
Knowledge-Based Systems
Artificial Intelligence in Medicine
Machine Learning Methods For Detecting Patterns Of Management Fraud
Computational Intelligence
Hi-index | 0.00 |
This paper argues that severe class imbalance is not just an interesting technical challenge that improved learning algorithms will address, it is much more serious. To be useful, a classifier must appreciably outperform a trivial solution, such as choosing the majority class. Any application that is inherently noisy limits the error rate, and cost, that is achievable. When data are normally distributed, even a Bayes optimal classifier has a vanishingly small reduction in the majority classifier's error rate, and cost, as imbalance increases. For fat tailed distributions, and when practical classifiers are used, often no reduction is achieved.