Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers

Authors:
Sebastián Maldonado;Claudio Montecinos
Affiliations:
Universidad de Los Andes, Mons. Alvaro del Portillo, Las Condes, Santiago, Chile;Operations Management Master Program, Universidad de Talca, Curicó, Chile
Venue:
Intelligent Data Analysis - Business Analytics and Intelligent Optimization
Year:
2014

Citing 27
Cited 0

On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV

Advances in kernel methods
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
Classifier Conditional Posterior Probabilities

SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Support Vector Data Description

Machine Learning
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
Predicting credit card customer churn in banks using data mining

International Journal of Data Analysis Techniques and Strategies
A wrapper method for feature selection using Support Vector Machines

Information Sciences: an International Journal
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Issues in stacked generalization

Journal of Artificial Intelligence Research
On learning algorithm selection for classification

Applied Soft Computing
Feature Selection with High-Dimensional Imbalanced Data

ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
Simultaneous feature selection and classification using kernel-penalized support vector machines

Information Sciences: an International Journal
Active learning and subspace clustering for anomaly detection

Intelligent Data Analysis
An exploration of learning when data is noisy and imbalanced

Intelligent Data Analysis
A novel SVM modeling approach for highly imbalanced and overlapping classification

Intelligent Data Analysis
Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Intelligent Systems, Design and Applications (ISDA 2009)
Data characterization for effective prototype selection

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Beyond accuracy, f-score and ROC: a family of discriminant measures for performance evaluation

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

200 words for Intelligent Data Systems The class imbalance problem is a relatively new challenge that has attracted growing attention from both industry and academia, since it strongly affects classification performance. Research also established that class imbalance is not an issue by itself, but its relationship with class overlapping and noise has an important impact on the prediction performance and stability. This fact has motivated the development of several approaches for classification of imbalanced data see e.g. [29,39]. In this paper, we present credit card customer churn prediction, an important topic in business analytics, using an ensemble of classifiers. Since this problem is considered as highly imbalanced, we employ different techniques for classification, such as Support Vector Data Description SVDD and two-class SVMs. The main idea is to address both class imbalance and class overlapping by stacking different classification approaches, while evaluating the diversity of the individual classifiers considering meta-learning measures. We performed experiments on artificial data sets and one real customer churn prediction problem from a Chilean financial entity, comparing our approach with well-known classification techniques for imbalanced data. The proposed strategy achieves an improvement of 6.1% over the best individual classifier in terms of predictive performance, providing accurate and robust classification models for different levels of balance and noise.