A fast data preprocessing procedure for support vector regression

Authors:
Zhifeng Hao;Wen Wen;Xiaowei Yang;Jie Lu;Guangquan Zhang
Affiliations:
School of Mathematical Science, South China University of Technology, Guangzhou, China;College of Computer Science and Engineering, South China University of Technology, Guangzhou, China;School of Mathematical Science, South China University of Technology, Guangzhou, China;Faculty of Information Technology University of technology Sydney, Broadway, Australia;Faculty of Information Technology University of technology Sydney, Broadway, Australia
Venue:
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Year:
2006

Citing 9
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support Vector Machine Regression for Volatile Stock Market Prediction

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing

Data Mining and Knowledge Discovery
Travel-time prediction with support vector regression

IEEE Transactions on Intelligent Transportation Systems
Successive overrelaxation for support vector machines

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fast data preprocessing procedure (FDPP) for support vector regression (SVR) is proposed in this paper. In the presented method, the dataset is firstly divided into several subsets and then K-means clustering is implemented in each subset. The clusters are classified by their group size. The centroids with small group size are eliminated and the rest centroids are used for SVR training. The relationships between the group sizes and the noisy clusters are discussed and simulations are also given. Results show that FDPP cleans most of the noises, preserves the useful statistical information and reduces the training samples. Most importantly, FDPP runs very fast and maintains the good regression performance of SVR.