A fast data preprocessing procedure for support vector regression

  • Authors:
  • Zhifeng Hao;Wen Wen;Xiaowei Yang;Jie Lu;Guangquan Zhang

  • Affiliations:
  • School of Mathematical Science, South China University of Technology, Guangzhou, China;College of Computer Science and Engineering, South China University of Technology, Guangzhou, China;School of Mathematical Science, South China University of Technology, Guangzhou, China;Faculty of Information Technology University of technology Sydney, Broadway, Australia;Faculty of Information Technology University of technology Sydney, Broadway, Australia

  • Venue:
  • IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A fast data preprocessing procedure (FDPP) for support vector regression (SVR) is proposed in this paper. In the presented method, the dataset is firstly divided into several subsets and then K-means clustering is implemented in each subset. The clusters are classified by their group size. The centroids with small group size are eliminated and the rest centroids are used for SVR training. The relationships between the group sizes and the noisy clusters are discussed and simulations are also given. Results show that FDPP cleans most of the noises, preserves the useful statistical information and reduces the training samples. Most importantly, FDPP runs very fast and maintains the good regression performance of SVR.