Using genetic K-means algorithm for PCA regression data in customer churn prediction

Authors:
Bingquan Huang;T. Satoh;Y. Huang;M.-T. Kechadi;B. Buckley
Affiliations:
School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland;School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland;School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland;School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland;School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland
Venue:
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Year:
2010

Citing 10
Cited 0

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Self-Organizing Maps

Self-Organizing Maps
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Customer Churn Prediction for Broadband Internet Services

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
A novel evolutionary data mining algorithm with applications to churn prediction

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Imbalance distribution of samples between churners and nonchurners can hugely affect churn prediction results in telecommunication services field. One method to solve this is over-sampling approach by PCA regression. However, PCA regression may not generate good churn samples if a dataset is nonlinear discriminant. We employed Genetic K-means Algorithm to cluster a dataset to find locally optimum small dataset to overcome the problem. The experiments were carried out on a real-world telecommunication dataset and assessed on a churn prediction task. The experiments showed that Genetic K-means Algorithm can improve prediction results for PCA regression and performed as good as SMOTE.