A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets

Authors:
Peng Li;Pei-Li Qiao;Yuan-Chao Liu
Affiliations:
-;-;-
Venue:
FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02
Year:
2008

Citing 0
Cited 3

A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function

Expert Systems with Applications: An International Journal
Using a boosted tree classifier for text segmentation in hand-annotated documents

Pattern Recognition Letters
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Support Vector Machine (SVM) has been widely studied and shown success in many application fields. However, the performance of SVM drops significantly when it is applied to the problem of learning from imbalanced data sets in which negative instances greatly outnumber the positive instances. This paper analyzes the intrinsic factors behind this failure and proposes a suitable re-sampling method. We re-sample the imbalance data by using variable SOM clustering so as to overcome the flaws of the traditional re-sampling methods, such as serious randomness, subjective interference and information loss. Then we prune the training set by means of K-NN rule to solve the problem of data confusion, which improves the generalization ability of SVM. Experiment results show that our method obviously improves the performance of the SVM on imbalanced data sets.