Iterative imputation algorithms for process modeling with incomplete data

Authors:
Samuel H. Huang;Ranganath Kothamasu;Niharika Rapur
Affiliations:
Intelligent Systems Laboratory, Department of Mechanical, Industrial and Nuclear Engineering, University of Cincinnati, Cincinnati, OH 45221, USA;Intelligent Systems Laboratory, Department of Mechanical, Industrial and Nuclear Engineering, University of Cincinnati, Cincinnati, OH 45221, USA;Intelligent Systems Laboratory, Department of Mechanical, Industrial and Nuclear Engineering, University of Cincinnati, Cincinnati, OH 45221, USA
Venue:
Intelligent Data Analysis
Year:
2007

Citing 5
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
Data quality

Information and Software Technology
On computing the largest fraction of missing information for the EM algorithm and the worst linear function for data augmentation

Computational Statistics & Data Analysis
Clustering Algorithms

Clustering Algorithms
Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modeling with real-world data is often plagued with the problem of missing values, limiting the applicability and validity of the developed model. Several algorithms exist in the literature to facilitate the analysis of incomplete data by imputing missing values. However, their imputation accuracy and practical applicability have not been systematically compared and studied. This makes the choice of appropriate imputation method difficult. The focus of this paper is to conduct an exploratory analysis of the popular missing data imputation algorithms. A new imputation algorithm based on clustering is also developed and demonstrated to be useful in a variety of ways to improve the efficiency of imputing missing values. These algorithms are benchmarked using datasets with significantly varying statistical properties. Based on the empirical results and theoretical analysis, a set of guidelines are proposed to assist in the selection of an appropriate imputation algorithm for a specific application. Finally these guidelines are used in a process modeling case study that involves the analysis of the design of an atomizer. It was observed that the imputed values are qualitatively valid thus providing evidence for the appropriateness of the proposed guidelines.