A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Expert Systems with Applications: An International Journal
The class imbalance problem: A systematic study
Intelligent Data Analysis
A weighted rough set based method developed for class imbalance learning
Information Sciences: an International Journal
Author identification: Using text sampling to handle the class imbalance problem
Information Processing and Management: an International Journal
A comparative study on rough set based class imbalance learning
Knowledge-Based Systems
An experimental comparison of performance measures for classification
Pattern Recognition Letters
A systematic analysis of performance measures for classification tasks
Information Processing and Management: an International Journal
Evaluating misclassifications in imbalanced data
ECML'06 Proceedings of the 17th European conference on Machine Learning
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
There are various algorithms used for binary classification where the cases are classified into one of two non-overlapping classes. The area under the receiver operating characteristic (ROC) curve is the most widely used metric to evaluate the performance of alternative binary classifiers. In this study, for the application domains where the high degree of imbalance is the main characteristic and the identification of the minority class is more important, we show that hit rate based measures are more correct to assess model performances and that they should be measured on out of time samples. We also try to identify the optimum composition of the training set. Logistic regression, neural network and CHAID algorithms are implemented for a real marketing problem of a bank and the performances are compared.