Matrix-based kernel principal component analysis for large-scale data set

  • Authors:
  • Weiya Shi;Yue-Fei Guo;Xiangyang Xue

  • Affiliations:
  • School of Computer Science, Fudan University, ShangHai, China;School of Computer Science, Fudan University, ShangHai, China;School of Computer Science, Fudan University, ShangHai, China

  • Venue:
  • IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Kernel Principal Component Analysis (KPCA) is a nonlinear feature extraction approach, which generally needs to eigen-decom pose the kernel matrix. But the size of kernel matrix scales with the number of data points, it is infeasible to store and compute the kernel matrix when faced with the large-scale data set. To overcome computational and storage problem for large-scale data set, a new framework, Matrix-based Kernel Principal Component Analysis (M-KPCA), is proposed. By dividing the large scale data set into small subsets, we could treat the autocorrelation matrix of each subset as the special computational unit. A novel polynomial-matrix kernel function is adopted to compute the similarity between the data matrices in place of vectors. It is also proved that the polynomial kernel is the extreme case of the polynomial-matrix one. The proposed M-KPCA can greatly reduce the size of kernel matrix, which makes its computation possible. The effectiveness is demonstrated by the experimental results on the artificial and real data set.