Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Quantification of Distance Bias Between Evaluation Metrics In Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Tree Induction for Probability-Based Ranking
Machine Learning
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Classification and knowledge discovery in protein databases
Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
Automatically countering imbalance and its empirical relationship to cost
Data Mining and Knowledge Discovery
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Adaptive methods for classification in arbitrarily imbalanced and drifting data streams
PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Human mobility, social ties, and link prediction
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Class confidence weighted kNN algorithms for imbalanced data sets
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Ensembles of decision trees for imbalanced data
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Using model trees and their ensembles for imbalanced data
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Hellinger distance decision trees are robust and skew-insensitive
Data Mining and Knowledge Discovery
Generating diverse ensembles to counter the problem of class imbalance
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Building decision trees for the multi-class imbalance problem
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Feature selection for high-dimensional imbalanced data
Neurocomputing
Robust pixel-based classification of obstacles for robotic harvesting of sweet-pepper
Computers and Electronics in Agriculture
Classification and outlier detection based on topic based pattern synthesis
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Training and assessing classification rules with imbalanced data
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Learning from unbalanced datasets presents a convoluted problem in which traditional learning algorithms may perform poorly. The objective functions used for learning the classifiers typically tend to favor the larger, less important classes in such problems. This paper compares the performance of several popular decision tree splitting criteria --- information gain, Gini measure, and DKM --- and identifies a new skew insensitive measure in Hellinger distance. We outline the strengths of Hellinger distance in class imbalance, proposes its application in forming decision trees, and performs a comprehensive comparative analysis between each decision tree construction method. In addition, we consider the performance of each tree within a powerful sampling wrapper framework to capture the interaction of the splitting metric and sampling. We evaluate over this wide range of datasets and determine which operate best under class imbalance.