Iterative imputation algorithms for process modeling with incomplete data

  • Authors:
  • Samuel H. Huang;Ranganath Kothamasu;Niharika Rapur

  • Affiliations:
  • Intelligent Systems Laboratory, Department of Mechanical, Industrial and Nuclear Engineering, University of Cincinnati, Cincinnati, OH 45221, USA;Intelligent Systems Laboratory, Department of Mechanical, Industrial and Nuclear Engineering, University of Cincinnati, Cincinnati, OH 45221, USA;Intelligent Systems Laboratory, Department of Mechanical, Industrial and Nuclear Engineering, University of Cincinnati, Cincinnati, OH 45221, USA

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modeling with real-world data is often plagued with the problem of missing values, limiting the applicability and validity of the developed model. Several algorithms exist in the literature to facilitate the analysis of incomplete data by imputing missing values. However, their imputation accuracy and practical applicability have not been systematically compared and studied. This makes the choice of appropriate imputation method difficult. The focus of this paper is to conduct an exploratory analysis of the popular missing data imputation algorithms. A new imputation algorithm based on clustering is also developed and demonstrated to be useful in a variety of ways to improve the efficiency of imputing missing values. These algorithms are benchmarked using datasets with significantly varying statistical properties. Based on the empirical results and theoretical analysis, a set of guidelines are proposed to assist in the selection of an appropriate imputation algorithm for a specific application. Finally these guidelines are used in a process modeling case study that involves the analysis of the design of an atomizer. It was observed that the imputed values are qualitatively valid thus providing evidence for the appropriateness of the proposed guidelines.