The dual-sparse topic model: mining focused topics and focused terms in short text

Authors:
Tianyi Lin;Wentao Tian;Qiaozhu Mei;Hong Cheng
Affiliations:
The Chinese University of Hong Kong, Shatin, Hong Kong;The Chinese University of Hong Kong, Shatin, Hong Kong;University of Michigan, Ann Arbor, MI, USA;The Chinese University of Hong Kong, Shatin, Hong Kong
Venue:
Proceedings of the 23rd international conference on World wide web
Year:
2014

Citing 14
Cited 0

A multiple cause mixture model for unsupervised learning

Neural Computation
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
Non-negative Matrix Factorization with Sparseness Constraints

The Journal of Machine Learning Research
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Automatic evaluation of topic coherence

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA

Information Retrieval
Comparing twitter and traditional media using topic models

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Regularized latent semantic indexing

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Probabilistic topic models

Communications of the ACM
The contextual focused topic model

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving LDA topic models for microblogs via tweet pooling and automatic labeling

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
One theme in all views: modeling consensus topics in multiple contexts

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a "Spike and Slab" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model outperforms both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents.