A novel approach for missing data processing based on compounded PSO clustering

Authors:
Hung-Pin Chiu;Tsen-Jen Wei;Hsiang-Yi Lee
Affiliations:
Department of Information Management, Nan Hua University, Dalin ChiaYi, ROC;Department of Information Management, Nan Hua University, Dalin ChiaYi, ROC;Department of Information Management, Nan Hua University, Dalin ChiaYi, ROC
Venue:
WSEAS Transactions on Information Science and Applications
Year:
2009

Citing 5
Cited 0

An evolutionary technique based on K-means algorithm for optimal clustering in RN

Information Sciences—Applications: An International Journal
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A new method to estimate null values in relational database systems based on automatic clustering techniques

Information Sciences: an International Journal
A new approach for estimating null value in relational database

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Generating weighted fuzzy rules from relational database systems for estimating values using genetic algorithms

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Incomplete and noisy data significantly distort data mining results. Therefore, taking care of missing values or noisy data becomes extremely crucial in data mining. Recent researches start to exploit data clustering techniques to estimate missing values. Obviously the quality of clustering analysis significantly influences the performance of missing data estimation. It was proven that clustering problem is NP-hard. Particle swarm optimization (PSO) is the recently suggested heuristic search process for solving data clustering problems. In this paper, a compounded PSO (CPSO) clustering approach is proposed for the missing value estimation. Normalization methods are first utilized to filter outliers and prevent some attributes from dominating the clustering result. Then the K-means algorithm and reflex mechanism are combined with the standard PSO clustering so that it can quickly converge to a reasonable good solution. Meanwhile, an iteration-based filling-in value scheme is utilized to guide the searching of CPSO clustering for the optimal estimate values. Effectiveness of the proposed approach is demonstrated on some data sets for four different rates of missing data. The empirical evaluation shows the superiority of CPSO over the well known K-means, PSO, and SOM-based approaches, and it is desirable for solving missing value problems.