Comparing alternative classifiers for database marketing: The case of imbalanced datasets

Authors:
Ekrem Duman;Yeliz Ekinci;Aydın Tanrıverdi
Affiliations:
Dogus University, Industrial Engineering Department, Acibadem, 34722 Istanbul, Turkey;Dogus University, Industrial Engineering Department, Acibadem, 34722 Istanbul, Turkey;Dogus University, Industrial Engineering Department, Acibadem, 34722 Istanbul, Turkey
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 10
Cited 1

A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Comparison of logistic regression model and classification tree: An application to postpartum depression data

Expert Systems with Applications: An International Journal
The class imbalance problem: A systematic study

Intelligent Data Analysis
A weighted rough set based method developed for class imbalance learning

Information Sciences: an International Journal
Author identification: Using text sampling to handle the class imbalance problem

Information Processing and Management: an International Journal
A comparative study on rough set based class imbalance learning

Knowledge-Based Systems
An experimental comparison of performance measures for classification

Pattern Recognition Letters
A systematic analysis of performance measures for classification tasks

Information Processing and Management: an International Journal
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Evaluating misclassifications in imbalanced data

ECML'06 Proceedings of the 17th European conference on Machine Learning

A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

There are various algorithms used for binary classification where the cases are classified into one of two non-overlapping classes. The area under the receiver operating characteristic (ROC) curve is the most widely used metric to evaluate the performance of alternative binary classifiers. In this study, for the application domains where the high degree of imbalance is the main characteristic and the identification of the minority class is more important, we show that hit rate based measures are more correct to assess model performances and that they should be measured on out of time samples. We also try to identify the optimum composition of the training set. Logistic regression, neural network and CHAID algorithms are implemented for a real marketing problem of a bank and the performances are compared.