Hot deck methods for imputing missing data: the effects of limiting donor usage

Authors:
Dieter William Joenssen;Udo Bankhofer
Affiliations:
Fachgebiet für Quantitative Methoden, Technische Universität Ilmenau, Ilmenau, Germany;Fachgebiet für Quantitative Methoden, Technische Universität Ilmenau, Ilmenau, Germany
Venue:
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2012

Citing 2
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
Software Cost Estimation with Incomplete Data

IEEE Transactions on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Missing data methods, within the data mining context, are limited in computational complexity due to large data amounts. Amongst the computationally simple yet effective imputation methods are the hot deck procedures. Hot deck methods impute missing values within a data matrix by using available values from the same matrix. The object, from which these available values are taken for imputation within another, is called the donor. The replication of values leads to the problem, that a single donor might be selected to accommodate multiple recipients. The inherent risk posed by this is that too many, or even all, missing values may be imputed with the values from a single donor. To mitigate this risk, some hot deck variants limit the amount of times any one donor may be selected for donating its values. This inevitably leads to the question under which conditions such a limitation is sensible. This study aims to answer this question though an extensive simulation. The results show rather clear differences between imputations by hot deck methods in which the donor limit was varied. In addition to these differences, influencing factors are identified that determine whether or not a donor limit is sensible.