A novel SVM modeling approach for highly imbalanced and overlapping classification

  • Authors:
  • Yu Qu;Hongye Su;Lichao Guo;Jian Chu

  • Affiliations:
  • (Correspd. Tel.: +86 571 8795 2233 Ext. 6631, +86 137 3228 0808 (Mobile)/ E-mail: yqu.zju@gmail.com/ zero_qy@yahoo.com.cn) State Key Laboratory of Industrial Control Technology, Institute of Cyber ...;State Key Laboratory of Industrial Control Technology, Institute of Cyber-System and Control, Yuquan Campus, Zhejiang University, Hangzhou, 310027, China;State Key Laboratory of Industrial Control Technology, Institute of Cyber-System and Control, Yuquan Campus, Zhejiang University, Hangzhou, 310027, China;State Key Laboratory of Industrial Control Technology, Institute of Cyber-System and Control, Yuquan Campus, Zhejiang University, Hangzhou, 310027, China

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional classification algorithms can be limited in their performance on highly imbalanced and overlapping data sets, In this paper, we focus on modifying support vector machines (SVMs) to make it suitable for highly imbalanced and overlapping (HIO) classification. Based on the analysis of most SVM learning algorithms for imbalanced classification, we argue that in SVM-based algorithms, due to the linearity property of SVM, the key problem is that the increase of the number of correctly predicted minority samples will lead to even more majority samples be misclassified. Then a novel algorithm HIO-SVM is developed, it can recognize all minority samples while minimizing the error rate of majority ones. The proposed approach can identify the non-overlapping samples in one feature space, furthermore, by iteratively shifting kernel spaces, all non-overlapping samples in different kernel spaces are recognized. Because of the highly imbalanced distribution, the remaining overlapping samples can be regarded as minority. Then all minority samples can be predicted correctly and the error rate of majority samples can be guaranteed minimized simultaneously. Finally, numerous case studies show the properties and effectiveness of the proposed HIO-SVM algorithm.