Improved response modeling based on clustering, under-sampling, and ensemble

Authors:
Pilsung Kang;Sungzoon Cho;Douglas L. MacLachlan
Affiliations:
IT Management Programme, International Fusion School, Seoul National University of Science and Technology (Seoultech), 232 Gongneoung ro, Nowon-gu, 139-743 Seoul, South Korea;Department of Industrial Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, 151-744 Seoul, South Korea;Department of Marketing and International Business, Foster School of Business, University of Washington, Seattle, WA 98195, USA
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 31
Cited 0

Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Classification of imbalanced remote-sensing data by neural networks

Pattern Recognition Letters - special issue on pattern recognition in practice V
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Data clustering: a review

ACM Computing Surveys (CSUR)
Learning When Negative Examples Abound

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Quality Scheme Assessment in the Clustering Process

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Improving Identification of Difficult Small Classes by Balancing Class Distribution

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Mining sales data using a neural network model of market response

ACM SIGKDD Explorations Newsletter
The 2003 ISMS Practice Prize Winner: Optimizing Rhenania's Direct Marketing Business Through Dynamic Multilevel Modeling (DMLM) in a Multicatalog-Brand Environment

Marketing Science
Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

The Journal of Machine Learning Research
Customer Targeting: A Neural Network Approach Guided by Genetic Algorithms

Management Science
A neural network application to consumer classification to improve the timing of direct marketing activities

Computers and Operations Research
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Efficient Text Classification by Weighted Proximal SVM

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Direct Marketing When There Are Voluntary Buyers

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Extracting underlying meaningful features and canceling noise using independent component analysis for direct marketing

Expert Systems with Applications: An International Journal
Focusing on non-respondents: Response modeling with novelty detectors

Expert Systems with Applications: An International Journal
Customizing Promotions in Online Stores

Marketing Science
Prediction in Marketing Using the Support Vector Machine

Marketing Science
How to Compute Optimal Catalog Mailing Decisions

Marketing Science
Supporting diagnosis of attention-deficit hyperactive disorder with novelty detection

Artificial Intelligence in Medicine
Locally linear reconstruction for instance-based learning

Pattern Recognition
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Response modeling with support vector machines

Expert Systems with Applications: An International Journal
K-means clustering seeds initialization based on centrality, sparsity, and isotropy

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Learning classifiers from imbalanced data based on biased minimax probability machine

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Mining direct marketing data by ensembles of weak learners and rough set methods

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
EvIdentTM: a functional magnetic resonance image analysis system

Artificial Intelligence in Medicine
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	12.05

Visualization

Abstract

The purpose of response modeling for direct marketing is to identify those customers who are likely to purchase a campaigned product, based upon customers' behavioral history and other information available. Contrary to mass marketing strategy, well-developed response models used for targeting specific customers can contribute profits to firms by not only increasing revenues, but also lowering marketing costs. Endemic in customer data used for response modeling is a class imbalance problem: the proportion of respondents is small relative to non-respondents. In this paper, we propose a novel data balancing method based on clustering, under-sampling, and ensemble to deal with the class imbalance problem, and thus improve response models. Using publicly available response modeling data sets, we compared the proposed method with other data balancing methods in terms of prediction accuracy and profitability. To investigate the usability of the proposed algorithm, we also employed various prediction algorithms when building the response models. Based on the response rate and profit analysis, we found that our proposed method (1) improved the response model by increasing response rate as well as reducing performance variation, and (2) increased total profit by significantly boosting revenue.