The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem
IEEE Transactions on Knowledge and Data Engineering
An information granulation based data mining approach for classifying imbalanced data
Information Sciences: an International Journal
Cluster-based under-sampling approaches for imbalanced data distributions
Expert Systems with Applications: An International Journal
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
On strategies for imbalanced text classification using SVM: A comparative study
Decision Support Systems
Proceedings of the international conference on Multimedia information retrieval
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Hi-index | 0.00 |
This study proposes a normal distribution-based over-sampling approach to balance the number of instances belonging to different classes in a data set. The balanced training data are used to learn unbiased classifiers for the original data set. Under some conditions, the proposed over-sampling approach generates samples with expected mean and variance similar to that of the original minority class data. As the approach tries to generate synthetic data with similar probability distributions to the original data, and expands the class boundaries of the minority class, it may increase the minority class classification performance. Experimental results show that the proposed approach outperforms alternative methods on benchmark data sets most of the times when implementing several classical classification algorithms.