Multiple imputation in principal component analysis

Authors:
Julie Josse;Jérôme Pagès;François Husson
Affiliations:
Agrocampus Ouest, Rennes, France 35042;Agrocampus Ouest, Rennes, France 35042;Agrocampus Ouest, Rennes, France 35042
Venue:
Advances in Data Analysis and Classification
Year:
2011

Citing 5
Cited 1

Statistical analysis with missing data

Statistical analysis with missing data
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
On the number of principal components: A test of dimensionality based on measurements of similarity between matrices

Computational Statistics & Data Analysis
How many principal components? stopping rules for determining the number of non-trivial axes revisited

Computational Statistics & Data Analysis
Practical Approaches to Principal Component Analysis in the Presence of Missing Values

The Journal of Machine Learning Research

Exploring incomplete data using visualization techniques

Advances in Data Analysis and Classification

Quantified Score

Hi-index	0.00

Visualization

Abstract

The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.