IEEE Transactions on Pattern Analysis and Machine Intelligence
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Support Vector Machines with Embedded Reject Option
SVM '02 Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines
Concept-Learning in the Presence of Between-Class and Within-Class Imbalances
AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
A Recognition-Based Alternative to Discrimination-Based Multi-layer Perceptrons
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Facing Imbalanced Classes through Aggregation of Classifiers
ICIAP '07 Proceedings of the 14th International Conference on Image Analysis and Processing
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Several binary classification problems exhibit imbalance in class distribution, influencing system learning. Indeed, traditional machine learning algorithms are biased towards the majority class, thus producing poor predictive accuracy over the minority one. To overcome this limitation, many approaches have been proposed up to now to build artificially balanced training sets. Further to their specific drawbacks, they achieve more balanced accuracies on each class harming the global accuracy. This paper first reviews the more recent method coping with imbalanced datasets and then proposes a strategy overcoming the main drawbacks of existing approaches. It is based on an ensemble of classifiers trained on balanced subsets of the original imbalanced training set working in conjunction with the classifier trained on the original imbalanced dataset. The performance of the method has been estimated on six public datasets, proving its effectiveness also in comparison with other approaches. It also gives the chance to modify the system behaviour according to the operating scenario.