Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data

Authors:
Vadim V. Ayuyev;Joseph Jupin;Philip W. Harris;Zoran Obradovic
Affiliations:
FN1-KF Department, Bauman Moscow State Technical University (Kaluga Branch), Kaluga, Russian Federation 248600;Center for Information Science and Technology, Temple University, Philadelphia, USA 19122;Department of Criminal Justice, Temple University, Philadelphia, USA 19122;Center for Information Science and Technology, Temple University, Philadelphia, USA 19122
Venue:
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Year:
2009

Citing 9
Cited 2

Statistical analysis with missing data

Statistical analysis with missing data
A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Information Retrieval

Information Retrieval
Oriented principal component analysis for large margin classifiers

Neural Networks
Cluster-Based Algorithms for Dealing with Missing Values

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)

Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)

A robust learning model for dealing with missing values in many-core architectures

ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
Modeling multivariate spatio-temporal remote sensing data with large gaps

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two

Quantified Score

Hi-index	0.00

Visualization

Abstract

The appropriate choice of a method for imputation of missing data becomes especially important when the fraction of missing values is large and the data are of mixed type. The proposed dynamic clustering imputation (DCI) algorithm relies on similarity information from shared neighbors, where mixed type variables are considered together. When evaluated on a public social science dataset of 46,043 mixed type instances with up to 33% missing values, DCI resulted in more than 20% improved imputation accuracy over Multiple Imputation, Predictive Mean Matching, Linear and Multilevel Regression, and Mean Mode Replacement methods. Data imputed by 6 methods were used for prediction tests by NB-Tree, Random Subset Selection and Neural Network-based classification models. In our experiments classification accuracy obtained using DCI-preprocessed data was much better than when relying on alternative imputation methods for data preprocessing.