Efficient computation of PCA with SVD in SQL

Authors:
Mario Navas;Carlos Ordonez
Affiliations:
University of Houston, Houston, TX;University of Houston, Houston, TX
Venue:
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Year:
2009

Citing 20
Cited 3

Data mining: concepts and techniques

Data mining: concepts and techniques
Locally adaptive dimensionality reduction for indexing large time series databases

ACM Transactions on Database Systems (TODS)
A novel approach to determine normal variation in gene expression data

ACM SIGKDD Explorations Newsletter
Vertical and horizontal percentage aggregations

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Horizontal aggregations for building tabular data sets

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Optimizing recursive queries in SQL

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
SVD-based collaborative filtering with privacy

Proceedings of the 2005 ACM symposium on Applied computing
Robust PCA and classification in biosciences

Bioinformatics
Vector and matrix operations programmed with UDFs in a relational DBMS

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

The Journal of Machine Learning Research
Improved discriminate analysis for high-dimensional data and its application to face recognition

Pattern Recognition
Building statistical models and scoring with UDFs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Dimensionality reduction and generalization

Proceedings of the 24th international conference on Machine learning
A new PCA-based method for data compression and enhancement of multi-frequency polarimetric SAR imagery

Intelligent Data Analysis
Computing for Numerical Methods Using Visual C++

Computing for Numerical Methods Using Visual C++
Unsupervised feature selection for principal components analysis

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Numerical Recipes 3rd Edition: The Art of Scientific Computing

Numerical Recipes 3rd Edition: The Art of Scientific Computing
Optimal Solutions for Sparse Principal Component Analysis

The Journal of Machine Learning Research
Microarray data analysis with PCA in a DBMS

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics

Comparing SQL and MapReduce to compute Naive Bayes in a single table scan

CloudDB '10 Proceedings of the second international workshop on Cloud data management
TVi: a visual querying system for network monitoring and anomaly detection

Proceedings of the 8th International Symposium on Visualization for Cyber Security
A data mining system based on SQL queries and UDFs for relational databases

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

PCA is one of the most common dimensionality reduction techniques with broad applications in data mining, statistics and signal processing. In this work we study how to leverage a DBMS computing capabilities to solve PCA. We propose a solution that combines a summarization of the data set with the correlation or covariance matrix and then solve PCA with Singular Value Decomposition (SVD). Deriving the summary matrices allow analyzing large data sets since they can be computed in a single pass. Solving SVD without external libraries proves to be a challenge to compute in SQL. We introduce two solutions: one based in SQL queries and a second one based on User-Defined Functions. Experimental evaluation shows our method can solve larger problems in less time than external statistical packages.