Efficient computation of PCA with SVD in SQL

  • Authors:
  • Mario Navas;Carlos Ordonez

  • Affiliations:
  • University of Houston, Houston, TX;University of Houston, Houston, TX

  • Venue:
  • Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

PCA is one of the most common dimensionality reduction techniques with broad applications in data mining, statistics and signal processing. In this work we study how to leverage a DBMS computing capabilities to solve PCA. We propose a solution that combines a summarization of the data set with the correlation or covariance matrix and then solve PCA with Singular Value Decomposition (SVD). Deriving the summary matrices allow analyzing large data sets since they can be computed in a single pass. Solving SVD without external libraries proves to be a challenge to compute in SQL. We introduce two solutions: one based in SQL queries and a second one based on User-Defined Functions. Experimental evaluation shows our method can solve larger problems in less time than external statistical packages.