Scalable mining of social data using stochastic gradient fisher scoring

  • Authors:
  • Jeon-Hyung Kang;Kristina Lerman

  • Affiliations:
  • University of Southern California Information Sciences Institute, Marina del Rey, CA, USA;University of Southern California Information Sciences Institute, Marina del Rey, CA, USA

  • Venue:
  • Proceedings of the 2013 workshop on Data-driven user behavioral modelling and mining from social media
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid growth of social data in the form of videos, microblog posts and other items shared on social media presents new opportunities for learning user behavior and preferences. Bayesian models have been used widely for modeling social data, since they capture uncertainty and prior knowledge, avoid overfitting, and can be easily extended to incorporate new types of data. Researchers have used a variety of inference procedures to learn model parameters from data. Specifically, Stochastic Gradient Fisher Scoring (SGFS) method was recently proposed for efficient inference. This method samples from a Bayesian posterior using small number of data samples in each iteration, instead of the entire data, to speed up the inference process. In this paper we explore the feasibility of SGFS for social data mining. We find that SGFS often outperforms other inference methods in dense data, but it fails in the sparse "long-tail" where there are not enough instances for it to learn parameters. This is problematic, because social data often has long-tailed distribution. To address this problem, we propose hybrid SGFS (hSGFS) and evaluate its performance on a variety of social data sets. We find that hSGFS is better able to predict held out items in data sets that have a long-tailed distribution.