The dual-sparse topic model: mining focused topics and focused terms in short text

  • Authors:
  • Tianyi Lin;Wentao Tian;Qiaozhu Mei;Hong Cheng

  • Affiliations:
  • The Chinese University of Hong Kong, Shatin, Hong Kong;The Chinese University of Hong Kong, Shatin, Hong Kong;University of Michigan, Ann Arbor, MI, USA;The Chinese University of Hong Kong, Shatin, Hong Kong

  • Venue:
  • Proceedings of the 23rd international conference on World wide web
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a "Spike and Slab" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model outperforms both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents.