Missing value imputation based on data clustering

Authors:
Shichao Zhang;Jilian Zhang;Xiaofeng Zhu;Yongsong Qin;Chengqi Zhang
Affiliations:
Department of Computer Science, Guangxi Normal University, Guilin, China;School of Information Systems, Singapore Management University, Singapore;Department of Computer Science, Guangxi Normal University, Guilin, China;Department of Computer Science, Guangxi Normal University, Guilin, China;Faculty of Information Technology, University of Technology Sydney, Broadway, NSW, Australia
Venue:
Transactions on computational science I
Year:
2008

Citing 13
Cited 6

Handling missing data by using stored truth values

ACM SIGMOD Record
C4.5: programs for machine learning

C4.5: programs for machine learning
Minimal Projective Reconstruction Including Missing Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Learning with Missing Data

Machine Learning
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Guest Editors' Introduction: Information Enhancement for Data Mining

IEEE Intelligent Systems
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
"Missing Is Useful': Missing Values in Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering
Semi-parametric optimization for missing data imputation

Applied Intelligence
Optimized parameters for missing data imputation

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
GBKII: an imputation method for missing values

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Generating weighted fuzzy rules from relational database systems for estimating values using genetic algorithms

IEEE Transactions on Fuzzy Systems

Missing Data Analysis: A Kernel-Based Multi-Imputation Approach

Transactions on Computational Science III
Shell-neighbor method and its application in missing data imputation

Applied Intelligence
LAD-CBM; new data processing tool for diagnosis and prognosis in condition-based maintenance

Journal of Intelligent Manufacturing
Missing value estimation of microarray data using similarity measurement

SEMCCO'12 Proceedings of the Third international conference on Swarm, Evolutionary, and Memetic Computing
Locally linear reconstruction based missing value imputation for supervised learning

Neurocomputing
Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an efficient nonparametric missing value imputation method based on clustering, called CMI (Clustering-based Missing value Imputation), for dealing with missing values in target attributes. In our approach, we impute the missing values of an instance A with plausible values that are generated from the data in the instances which do not contain missing values and are most similar to the instance A using a kernel-based method. Specifically, we first divide the dataset (including the instances with missing values) into clusters. Next, missing values of an instance A are patched up with the plausible values generated from A's cluster. Extensive experiments show the effectiveness of the proposed method in missing value imputation task.