Emerging topic detection using dictionary learning

  • Authors:
  • Shiva Prasad Kasiviswanathan;Prem Melville;Arindam Banerjee;Vikas Sindhwani

  • Affiliations:
  • IBM TJ Watson Research, Yorktown Heights, NY, USA;IBM TJ Watson Research, Yorktown Heights, NY, USA;University of Minnesota, Twin Cities, MN, USA;IBM TJ Watson Research, Yorktown Heights, NY, USA

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Streaming user-generated content in the form of blogs, microblogs, forums, and multimedia sharing sites, provides a rich source of data from which invaluable information and insights maybe gleaned. Given the vast volume of such social media data being continually generated, one of the challenges is to automatically tease apart the emerging topics of discussion from the constant background chatter. Such emerging topics can be identified by the appearance of multiple posts on a unique subject matter, which is distinct from previous online discourse. We address the problem of identifying emerging topics through the use of dictionary learning. We propose a two stage approach respectively based on detection and clustering of novel user-generated content. We derive a scalable approach by using the alternating directions method to solve the resulting optimization problems. Empirical results show that our proposed approach is more effective than several baselines in detecting emerging topics in traditional news story and newsgroup data. We also demonstrate the practical application to social media analysis, based on a study on streaming data from Twitter.