Analysis of Variational Bayesian Matrix Factorization

Authors:
Shinichi Nakajima;Masashi Sugiyama
Affiliations:
Nikon Corporation, Tokyo, Japan 140-8601;Tokyo Institute of Technology, Tokyo, Japan 152-8552
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 8
Cited 0

GroupLens: applying collaborative filtering to Usenet news

Communications of the ACM
Fast maximum margin matrix factorization for collaborative prediction

ICML '05 Proceedings of the 22nd international conference on Machine learning
Algebraic Analysis for Nonidentifiable Learning Machines

Neural Computation
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Variational Bayes Solution of Linear Neural Networks and Its Generalization Performance

Neural Computation
Principal Component Analysis for Large Scale Problems with Lots of Missing Values

ECML '07 Proceedings of the 18th European conference on Machine Learning
Inferring parameters and structure of latent variable models by variational bayes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning in linear neural networks: a survey

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the variational Bayesian approximation was applied to probabilistic matrix factorization and shown to perform very well in experiments. However, its good performance was not completely understood beyond its experimental success. The purpose of this paper is to theoretically elucidate properties of a variational Bayesian matrix factorization method. In particular, its mechanism of avoiding overfitting is analyzed. Our analysis relies on the key fact that the matrix factorization model induces non-identifiability, i.e., the mapping between factorized matrices and the original matrix is not one-to-one. The positive-part James-Stein shrinkage operator and the Marcenko-Pastur law--the limiting distribution of eigenvalues of the central Wishart distribution--play important roles in our analysis.