Missing Value Estimation for Mixed-Attribute Data Sets

Authors:
Xiaofeng Zhu;Shichao Zhang;Zhi Jin;Zili Zhang;Zhuoming Xu
Affiliations:
University Technology Sydney, Sydney, Australia;Zhejiang Normal University, Jinhua, China;Beijing University, Beijing, China;Southwest University Chongqing, China;Hohai University, Nanjing, China
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2011

Citing 0
Cited 6

SKIF: a data imputation framework for concept drifting data streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Missing data imputation by utilizing information within incomplete instances

Journal of Systems and Software
Nearest neighbor selection for iteratively kNN imputation

Journal of Systems and Software
WebPut: efficient web-based data imputation

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Locally linear reconstruction based missing value imputation for supervised learning

Neurocomputing
Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great successes on dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). This paper studies a new setting of missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting, there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernel-based iterative estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE) at different missing ratios.