A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Machine Learning
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach
Data Mining and Knowledge Discovery
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Enriching Scanner Panel Models with Choice Experiments
Marketing Science
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Modeling Browsing Behavior at Multiple Websites
Marketing Science
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
A fast algorithm for balanced sampling
Computational Statistics
Expert Systems with Applications: An International Journal
Modeling Online Browsing and Path Analysis Using Clickstream Data
Marketing Science
Random Forests for multiclass classification: Random MultiNomial Logit
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Study on customer churn prediction methods based on multiple classifiers combination
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Building comprehensible customer churn prediction models with advanced rule induction techniques
Expert Systems with Applications: An International Journal
A data mining framework for detecting subscription fraud in telecommunication
Engineering Applications of Artificial Intelligence
Expert Systems with Applications: An International Journal
Ensembles of probability estimation trees for customer churn prediction
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Exploring discrepancies in findings obtained with the KDD Cup '99 data set
Intelligent Data Analysis
Tuning metaheuristics: A data mining based approach for particle swarm optimization
Expert Systems with Applications: An International Journal
An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction
Expert Systems with Applications: An International Journal
Data preparation techniques for improving rare class prediction
MAMECTIS/NOLASC/CONTROL/WAMUS'11 Proceedings of the 13th WSEAS international conference on mathematical methods, computational techniques and intelligent systems, and 10th WSEAS international conference on non-linear analysis, non-linear systems and chaos, and 7th WSEAS international conference on dynamical systems and control, and 11th WSEAS international conference on Wavelet analysis and multirate systems: recent researches in computational techniques, non-linear systems and control
Time-varying effects in the analysis of customer loyalty: A case study in insurance
Expert Systems with Applications: An International Journal
From information to operations: Service quality and customer retention
ACM Transactions on Management Information Systems (TMIS)
Adjusting and generalizing CBA algorithm to handling class imbalance
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
Modeling partial customer churn: On the value of first product-category purchase sequences
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
A window of opportunity: Assessing behavioural scoring
Expert Systems with Applications: An International Journal
International Journal of Information Retrieval Research
Training and assessing classification rules with imbalanced data
Data Mining and Knowledge Discovery
Social network analysis for customer churn prediction
Applied Soft Computing
Profit optimizing customer churn prediction with Bayesian network classifiers
Intelligent Data Analysis - Business Analytics and Intelligent Optimization
Hi-index | 12.07 |
Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1), 7-19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques. AUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press]. Results show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI'2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique.