Data mining: concepts and techniques
Data mining: concepts and techniques
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
A novel approach to determine normal variation in gene expression data
ACM SIGKDD Explorations Newsletter
Vertical and horizontal percentage aggregations
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Horizontal aggregations for building tabular data sets
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
K-means clustering via principal component analysis
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Optimizing recursive queries in SQL
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
SVD-based collaborative filtering with privacy
Proceedings of the 2005 ACM symposium on Applied computing
Robust PCA and classification in biosciences
Bioinformatics
Vector and matrix operations programmed with UDFs in a relational DBMS
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models
The Journal of Machine Learning Research
Building statistical models and scoring with UDFs
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Dimensionality reduction and generalization
Proceedings of the 24th international conference on Machine learning
Intelligent Data Analysis
Computing for Numerical Methods Using Visual C++
Computing for Numerical Methods Using Visual C++
Unsupervised feature selection for principal components analysis
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Numerical Recipes 3rd Edition: The Art of Scientific Computing
Numerical Recipes 3rd Edition: The Art of Scientific Computing
Optimal Solutions for Sparse Principal Component Analysis
The Journal of Machine Learning Research
Microarray data analysis with PCA in a DBMS
Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Comparing SQL and MapReduce to compute Naive Bayes in a single table scan
CloudDB '10 Proceedings of the second international workshop on Cloud data management
TVi: a visual querying system for network monitoring and anomaly detection
Proceedings of the 8th International Symposium on Visualization for Cyber Security
A data mining system based on SQL queries and UDFs for relational databases
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
PCA is one of the most common dimensionality reduction techniques with broad applications in data mining, statistics and signal processing. In this work we study how to leverage a DBMS computing capabilities to solve PCA. We propose a solution that combines a summarization of the data set with the correlation or covariance matrix and then solve PCA with Singular Value Decomposition (SVD). Deriving the summary matrices allow analyzing large data sets since they can be computed in a single pass. Solving SVD without external libraries proves to be a challenge to compute in SQL. We introduce two solutions: one based in SQL queries and a second one based on User-Defined Functions. Experimental evaluation shows our method can solve larger problems in less time than external statistical packages.