Kernel principal component analysis for large scale data set

  • Authors:
  • Haixian Wang;Zilan Hu;Yu’e Zhao

  • Affiliations:
  • Research Center for Learning Science, Southeast University, Nanjing, Jiangsu, P.R. China;School of Mathematics and Physics, Anhui University of Technology, Maanshan, Anhui, P.R. China;Department of Mathematics, Qingdao University, Qingdao, Shandong, P.R. China

  • Venue:
  • ICIC'06 Proceedings of the 2006 international conference on Intelligent Computing - Volume Part I
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Kernel principal component analysis (KPCA) has provided an extremely powerful approach to extracting nonlinear features via kernel trick, and it has been suggested for a number of applications. Whereas the nonlinearity can be allowed by the utilization of Mercer kernels, the standard KPCA could only process limited number of training samples. For large scale data set, it may suffer from computational problem of diagonalizing large matrices, and occupy large storage space. In this paper, by choosing a subset of the entire training samples using Gram-Schmidt orthonormalization and incomplete Cholesky decomposition, we formulate KPCA as another eigenvalue problem of matrix whose size is much smaller than that of the kernel matrix. The theoretical analysis and experimental results on both artificial and real data have shown the advantages of the proposed method for performing KPCA in terms of computational efficiency and storage space, especially when the number of data points is large.