Preprocessing unbalanced data using support vector machine

  • Authors:
  • M. A. H. Farquad;Indranil Bose

  • Affiliations:
  • School of Business, The University of Hong Kong, Pok Fu Lam Road, Hong Kong;Indian Institute of Management Calcutta, Diamond Harbour Road, Kolkata 700104, India

  • Venue:
  • Decision Support Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper deals with the application of support vector machine (SVM) to deal with the class imbalance problem. The objective of this paper is to examine the feasibility and efficiency of SVM as a preprocessor. Our study analyzes different classification algorithms that are employed to predict the customers with caravan car policy based on his/her sociodemographic data and history of product ownership. A series of experiments was conducted to test various computational intelligence techniques viz., Multilayer Perceptron (MLP), Logistic Regression (LR), and Random Forest (RF). Various standard balancing techniques such as under-sampling, over-sampling and Synthetic Minority Over-sampling TEchnique (SMOTE) are also employed. Subsequently, a strategy of data balancing for handling imbalanced distribution in data is proposed. The proposed approach first employs SVM as a preprocessor and the actual target values of training data are then replaced by the predictions of trained SVM. Later, this modified training data is used to train techniques such as MLP, LR, and RF. Based on the measure of sensitivity, it is observed that the proposed approach not only balances the data effectively but also provides more number of instances for minority class, which in turn enhances the performance of the intelligence techniques.