Principal Component Analysis for Large Scale Problems with Lots of Missing Values

Authors:
Tapani Raiko;Alexander Ilin;Juha Karhunen
Affiliations:
Adaptive Informatics Research Center, Helsinki Univ. of Technology, P.O. Box 5400, FI-02015 TKK, Finland;Adaptive Informatics Research Center, Helsinki Univ. of Technology, P.O. Box 5400, FI-02015 TKK, Finland;Adaptive Informatics Research Center, Helsinki Univ. of Technology, P.O. Box 5400, FI-02015 TKK, Finland
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 5
Cited 9

Principal component neural networks: theory and applications

Principal component neural networks: theory and applications
Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications

Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Building Blocks for Variational Bayesian Learning of Latent Variable Models

The Journal of Machine Learning Research
Restricted Boltzmann machines for collaborative filtering

Proceedings of the 24th international conference on Machine learning

Principal Component Analysis for Sparse High-Dimensional Data

Neural Information Processing
Analysis of Variational Bayesian Matrix Factorization

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Practical Approaches to Principal Component Analysis in the Presence of Missing Values

The Journal of Machine Learning Research
Bayesian matrix co-factorization: variational algorithm and Cramér-Rao bound

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Theoretical Analysis of Bayesian Matrix Factorization

The Journal of Machine Learning Research
Conceptual views for entity-centric search: turning data into meaningful concepts

Computer Science - Research and Development
Active learning for online bayesian matrix factorization

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Web technologies for open innovation

Proceedings of the 3rd International Web Science Conference
Hierarchical Bayesian matrix factorization with side information

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Principal component analysis (PCA) is a well-known classical data analysis technique. There are a number of algorithms for solving the problem, some scaling better than others to problems with high dimensionality. They also differ in their ability to handle missing values in the data. We study a case where the data are high-dimensional and a majority of the values are missing. In case of very sparse data, overfitting becomes a severe problem even in simple linear models such as PCA. We propose an algorithm based on speeding up a simple principal subspace rule, and extend it to use regularization and variational Bayesian (VB) learning. The experiments with Netflix data confirm that the proposed algorithm is much faster than any of the compared methods, and that VB-PCA method provides more accurate predictions for new data than traditional PCA or regularized PCA.