Missing value imputation based on data clustering

  • Authors:
  • Shichao Zhang;Jilian Zhang;Xiaofeng Zhu;Yongsong Qin;Chengqi Zhang

  • Affiliations:
  • Department of Computer Science, Guangxi Normal University, Guilin, China;School of Information Systems, Singapore Management University, Singapore;Department of Computer Science, Guangxi Normal University, Guilin, China;Department of Computer Science, Guangxi Normal University, Guilin, China;Faculty of Information Technology, University of Technology Sydney, Broadway, NSW, Australia

  • Venue:
  • Transactions on computational science I
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an efficient nonparametric missing value imputation method based on clustering, called CMI (Clustering-based Missing value Imputation), for dealing with missing values in target attributes. In our approach, we impute the missing values of an instance A with plausible values that are generated from the data in the instances which do not contain missing values and are most similar to the instance A using a kernel-based method. Specifically, we first divide the dataset (including the instances with missing values) into clusters. Next, missing values of an instance A are patched up with the plausible values generated from A's cluster. Extensive experiments show the effectiveness of the proposed method in missing value imputation task.