A Missing Data Estimation Analysis in Type II Diabetes Databases

Authors:
Marisol Giardina;Yongyang Huo;Francisco Azuaje;Paul McCullagh;Roy Harper
Affiliations:
University of Ulster;University of Ulster;University of Ulster;University of Ulster;UK Ulster Community & Hospitals Trust
Venue:
CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
Year:
2005

Citing 0
Cited 1

Instance driven clustering for the imputation of missing data in KDD

International Journal of Communication Networks and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Type II diabetes is one of the most common causes of disability and death in the United Kingdom. This investigation analysed data acquired from diabetic patients at the Ulster Hospital in Northern Ireland in terms of statistical descriptive indicators and missing values. Such data are noisy and incomplete. This paper reports a comprehensive missing data estimation analysis. Five missing value imputation methods were compared, including k-Nearest Neighbours (k-NN) and correlation-based estimation models. From this analysis it can be concluded that a feature-based correlation method known as EMImpute_Columns is a promising approach to estimating missing values. Nevertheless, k-NN methods may be useful to provide relatively accurate estimations with lower error variability. These estimation techniques will support the implementation of supervised and unsupervised learning tools for coronary heart disease risk assessment, a major complication of diabetes