Multiple imputation in principal component analysis

  • Authors:
  • Julie Josse;Jérôme Pagès;François Husson

  • Affiliations:
  • Agrocampus Ouest, Rennes, France 35042;Agrocampus Ouest, Rennes, France 35042;Agrocampus Ouest, Rennes, France 35042

  • Venue:
  • Advances in Data Analysis and Classification
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.