Kernel-based principal components analysis on large telecommunication data

Authors:
Takeshi Sato;Bingquan Huang;Guillem Lefait;M-T. Kechadi;B. Buckley
Affiliations:
University College Dublin, Belfield, Dublin, Ireland;University College Dublin, Belfield, Dublin, Ireland;University College Dublin, Belfield, Dublin, Ireland;University College Dublin, Belfield, Dublin, Ireland;Eircom Limited, Heuston South Quarter, Dublin, Ireland
Venue:
AusDM '09 Proceedings of the Eighth Australasian Data Mining Conference - Volume 101
Year:
2009

Citing 6
Cited 0

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Iterative Kernel Principal Component Analysis for Image Modeling

IEEE Transactions on Pattern Analysis and Machine Intelligence
Customer Churn Prediction for Broadband Internet Services

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
Sparse Kernel PCA by Kernel K-means and preimage reconstruction algorithms

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linear Principal Components Analysis (LPCA) is known for its simplicity to reduce the features dimensionality. An extension of LPCA, Kernel Principal Components Analysis (KPCA), outperforms LPCA when applied on non-linear data in high dimensional feature space. However, on large datasets with high input space, KPCA deals with a memory issue and imbalance classification problems with difficulty. This paper presents an approach to reduce the complexity of the training process of KPCA by condensing the training set with sampling and clustering techniques as pre-processing step. The experiments were carried out on a large real-world Telecommunication dataset and were assessed on a churn prediction task. The experiments show that the proposed approach, when combined with clustering techniques, can efficiently reduce feature dimension and outperforms standard PCA for customer churn prediction.