How many principal components? stopping rules for determining the number of non-trivial axes revisited

Authors:
Pedro R. Peres-Neto;Donald A. Jackson;Keith M. Somers
Affiliations:
Department of Zoology, University of Toronto, Toronto, ON, Canada M5S 3G5;Department of Zoology, University of Toronto, Toronto, ON, Canada M5S 3G5;Department of Zoology, University of Toronto, Toronto, ON, Canada M5S 3G5
Venue:
Computational Statistics & Data Analysis
Year:
2005

Citing 1
Cited 11

Selection of components in principal component analysis: a comparison of methods

Computational Statistics & Data Analysis

On the number of principal components: A test of dimensionality based on measurements of similarity between matrices

Computational Statistics & Data Analysis
SubXPCA and a generalized feature partitioning approach to principal component analysis

Pattern Recognition
Dimensionality Reduction for Classification

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Brief communication: Classification for high-throughput data with an optimal subset of principal components

Computational Biology and Chemistry
On the Number of State Variables in Options Pricing

Management Science
Multiple imputation in principal component analysis

Advances in Data Analysis and Classification
Model selection for partial least squares based dimension reduction

Pattern Recognition Letters
Selecting the number of components in principal component analysis using cross-validation approximations

Computational Statistics & Data Analysis
Digital divide across the European Union

Information and Management
An ExPosition of multivariate analysis with the singular value decomposition in R

Computational Statistics & Data Analysis
Enhanced classification for high-throughput data with an optimal projection and hybrid classifier

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.03

Visualization

Abstract

Principal component analysis is one of the most widely applied tools in order to summarize common patterns of variation among variables. Several studies have investigated the ability of individual methods, or compared the performance of a number of methods, in determining the number of components describing common variance of simulated data sets. We identify a number of shortcomings related to these studies and conduct an extensive simulation study where we compare a larger number of rules available and develop some new methods. In total we compare 20 stopping rules and propose a two-step approach that appears to be highly effective. First, a Bartlett's test is used to test the significance of the first principal component, indicating whether or not at least two variables share common variation in the entire data set. If significant, a number of different rules can be applied to estimate the number of non-trivial components to be retained. However, the relative merits of these methods depend on whether data contain strongly correlated or uncorrelated variables. We also estimate the number of non-trivial components for a number of field data sets so that we can evaluate the applicability of our conclusions based on simulated data.