Sparse principal component analysis by choice of norm

Authors:
Xin Qi;Ruiyan Luo;Hongyu Zhao
Affiliations:
Department of Mathematics and Statistics, Georgia State University, 30 Pryor Street, Atlanta, GA 30303-3083, United States;Department of Mathematics and Statistics, Georgia State University, 30 Pryor Street, Atlanta, GA 30303-3083, United States;Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520-8034, United States
Venue:
Journal of Multivariate Analysis
Year:
2013

Citing 4
Cited 0

Convex Optimization

Convex Optimization
Sparse principal component analysis via regularized low rank matrix approximation

Journal of Multivariate Analysis
Projected gradient approach to the numerical solution of the SCoTLASS

Computational Statistics & Data Analysis
A Direct Formulation for Sparse PCA Using Semidefinite Programming

SIAM Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have seen the developments of several methods for sparse principal component analysis due to its importance in the analysis of high dimensional data. Despite the demonstration of their usefulness in practical applications, they are limited in terms of lack of orthogonality in the loadings (coefficients) of different principal components, the existence of correlation in the principal components, the expensive computation needed, and the lack of theoretical results such as consistency in high-dimensional situations. In this paper, we propose a new sparse principal component analysis method by introducing a new norm to replace the usual norm in traditional eigenvalue problems, and propose an efficient iterative algorithm to solve the optimization problems. With this method, we can efficiently obtain uncorrelated principal components or orthogonal loadings, and achieve the goal of explaining a high percentage of variations with sparse linear combinations. Due to the strict convexity of the new norm, we can prove the convergence of the iterative method and provide the detailed characterization of the limits. We also prove that the obtained principal component is consistent for a single component model in high dimensional situations. As illustration, we apply this method to real gene expression data with competitive results.