A covariance-free iterative algorithm for distributed principal component analysis on vertically partitioned data

  • Authors:
  • Yue-Fei Guo;Xiaodong Lin;Zhou Teng;Xiangyang Xue;Jianping Fan

  • Affiliations:
  • School of Computer Science, Fudan University, Shanghai, China;Department of Management and Information Systems, Rutgers University, NJ 08854, USA;School of Computer Science, Fudan University, Shanghai, China;School of Computer Science, Fudan University, Shanghai, China;Department of Computer Science, University of North Carolina at Charlotte, NC 28223, USA

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, a covariance-free iterative algorithm is developed to achieve distributed principal component analysis on high-dimensional data sets that are vertically partitioned. We have proved that our iterative algorithm converges monotonously with an exponential rate. Different from existing techniques that aim at approximating the global PCA, our covariance-free iterative distributed PCA (CIDPCA) algorithm can estimate the principal components directly without computing the sample covariance matrix. Therefore a significant reduction on transmission costs can be achieved. Furthermore, in comparison to existing distributed PCA techniques, CIDPCA can provide more accurate estimations of the principal components and classification results. We have demonstrated the superior performance of CIDPCA through the studies of multiple real-world data sets.