Selecting the number of components in principal component analysis using cross-validation approximations

  • Authors:
  • Julie Josse;François Husson

  • Affiliations:
  • -;-

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.03

Visualization

Abstract

Cross-validation is a tried and tested approach to select the number of components in principal component analysis (PCA), however, its main drawback is its computational cost. In a regression (or in a non parametric regression) setting, criteria such as the general cross-validation one (GCV) provide convenient approximations to leave-one-out cross-validation. They are based on the relation between the prediction error and the residual sum of squares weighted by elements of a projection matrix (or a smoothing matrix). Such a relation is then established in PCA using an original presentation of PCA with a unique projection matrix. It enables the definition of two cross-validation approximation criteria: the smoothing approximation of the cross-validation criterion (SACV) and the GCV criterion. The method is assessed with simulations and gives promising results.