The Journal of Machine Learning Research
Large-scale matrix factorization with distributed stochastic gradient descent
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Collaborative topic modeling for recommending scientific articles
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
LA-LDA: a limited attention topic model for social recommendation
SBP'13 Proceedings of the 6th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
Hi-index | 0.00 |
The rapid growth of social data in the form of videos, microblog posts and other items shared on social media presents new opportunities for learning user behavior and preferences. Bayesian models have been used widely for modeling social data, since they capture uncertainty and prior knowledge, avoid overfitting, and can be easily extended to incorporate new types of data. Researchers have used a variety of inference procedures to learn model parameters from data. Specifically, Stochastic Gradient Fisher Scoring (SGFS) method was recently proposed for efficient inference. This method samples from a Bayesian posterior using small number of data samples in each iteration, instead of the entire data, to speed up the inference process. In this paper we explore the feasibility of SGFS for social data mining. We find that SGFS often outperforms other inference methods in dense data, but it fails in the sparse "long-tail" where there are not enough instances for it to learn parameters. This is problematic, because social data often has long-tailed distribution. To address this problem, we propose hybrid SGFS (hSGFS) and evaluate its performance on a variety of social data sets. We find that hSGFS is better able to predict held out items in data sets that have a long-tailed distribution.