C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Tree Induction for Probability-Based Ranking
Machine Learning
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An empirical comparison of supervised learning algorithms
ICML '06 Proceedings of the 23rd international conference on Machine learning
Automatically countering imbalance and its empirical relationship to cost
Data Mining and Knowledge Discovery
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Ensembles of probability estimation trees for customer churn prediction
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Hellinger distance decision trees are robust and skew-insensitive
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Many machine learning applications like finance, medicine, and risk management suffer from class imbalance: cases of interest occur rarely. Further complicating these applications is that the training and testing samples might differ significantly in their respective class distributions. Sampling has been shown to be a strong solution to imbalance and additionally offers a rich parameter space from which to select classifiers. This paper is concerned with the interaction between Probability Estimation Trees (PETs) [1], sampling, and performance metrics as testing distributions fluctuate substantially. A set of comprehensive analyses is presented, which anticipate classifier performance through a set of widely varying testing distributions.