A novel approach for missing data processing based on compounded PSO clustering

  • Authors:
  • Hung-Pin Chiu;Tsen-Jen Wei;Hsiang-Yi Lee

  • Affiliations:
  • Department of Information Management, Nan Hua University, Dalin ChiaYi, ROC;Department of Information Management, Nan Hua University, Dalin ChiaYi, ROC;Department of Information Management, Nan Hua University, Dalin ChiaYi, ROC

  • Venue:
  • WSEAS Transactions on Information Science and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Incomplete and noisy data significantly distort data mining results. Therefore, taking care of missing values or noisy data becomes extremely crucial in data mining. Recent researches start to exploit data clustering techniques to estimate missing values. Obviously the quality of clustering analysis significantly influences the performance of missing data estimation. It was proven that clustering problem is NP-hard. Particle swarm optimization (PSO) is the recently suggested heuristic search process for solving data clustering problems. In this paper, a compounded PSO (CPSO) clustering approach is proposed for the missing value estimation. Normalization methods are first utilized to filter outliers and prevent some attributes from dominating the clustering result. Then the K-means algorithm and reflex mechanism are combined with the standard PSO clustering so that it can quickly converge to a reasonable good solution. Meanwhile, an iteration-based filling-in value scheme is utilized to guide the searching of CPSO clustering for the optimal estimate values. Effectiveness of the proposed approach is demonstrated on some data sets for four different rates of missing data. The empirical evaluation shows the superiority of CPSO over the well known K-means, PSO, and SOM-based approaches, and it is desirable for solving missing value problems.