Preprocessing unbalanced data using support vector machine

Authors:
M. A. H. Farquad;Indranil Bose
Affiliations:
School of Business, The University of Hong Kong, Pok Fu Lam Road, Hong Kong;Indian Institute of Management Calcutta, Diamond Harbour Road, Kolkata 700104, India
Venue:
Decision Support Systems
Year:
2012

Citing 34
Cited 6

The nature of statistical learning theory

The nature of statistical learning theory
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Mining in a data-flow environment: experience in network intrusion detection

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving the manufacturability of electronic designs

IEEE Spectrum
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Random Forests

Machine Learning
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Improving Identification of Difficult Small Classes by Balancing Class Distribution

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Credit rating analysis with support vector machines and neural networks: a market comparative study

Decision Support Systems - Special issue: Data mining for financial decision making
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
The class imbalance problem: A systematic study

Intelligent Data Analysis
Top 10 algorithms in data mining

Knowledge and Information Systems
Association rules applied to credit card fraud detection

Expert Systems with Applications: An International Journal
On the Class Imbalance Problem

ICNC '08 Proceedings of the 2008 Fourth International Conference on Natural Computation - Volume 04
Classification algorithm sensitivity to training data with non representative attribute noise

Decision Support Systems
Designing an expert system for fraud detection in private telecommunications networks

Expert Systems with Applications: An International Journal
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning from imbalanced data in surveillance of nosocomial infection

Artificial Intelligence in Medicine
On strategies for imbalanced text classification using SVM: A comparative study

Decision Support Systems
Data Mining Using Rules Extracted from SVM: An Application to Churn Prediction in Bank Credit Cards

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Web user behavioral profiling for user identification

Decision Support Systems
Rule extraction from support vector machine using modified active learning based approach: an application to CRM

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Learning classifiers from imbalanced data based on biased minimax probability machine

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Detection of financial statement fraud and feature selection using data mining techniques

Decision Support Systems
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
Machine learning for medical diagnosis: history, state of the art and perspective

Artificial Intelligence in Medicine

Integration Of Random Sample Selection, Support Vector Machines And Ensembles For Financial Risk Forecasting With An Empirical Analysis On The Necessity Of Feature Selection

International Journal of Intelligent Systems in Accounting and Finance Management
Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction

Knowledge-Based Systems
Developing fast predictors for large-scale time series using fuzzy granular support vector machines

Applied Soft Computing
A social network-empowered research analytics framework for project selection

Decision Support Systems
Text classification for assisting moderators in online health communities

Journal of Biomedical Informatics
Pricing and disseminating customer data with privacy awareness

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper deals with the application of support vector machine (SVM) to deal with the class imbalance problem. The objective of this paper is to examine the feasibility and efficiency of SVM as a preprocessor. Our study analyzes different classification algorithms that are employed to predict the customers with caravan car policy based on his/her sociodemographic data and history of product ownership. A series of experiments was conducted to test various computational intelligence techniques viz., Multilayer Perceptron (MLP), Logistic Regression (LR), and Random Forest (RF). Various standard balancing techniques such as under-sampling, over-sampling and Synthetic Minority Over-sampling TEchnique (SMOTE) are also employed. Subsequently, a strategy of data balancing for handling imbalanced distribution in data is proposed. The proposed approach first employs SVM as a preprocessor and the actual target values of training data are then replaced by the predictions of trained SVM. Later, this modified training data is used to train techniques such as MLP, LR, and RF. Based on the measure of sensitivity, it is observed that the proposed approach not only balances the data effectively but also provides more number of instances for minority class, which in turn enhances the performance of the intelligence techniques.