A multiple cause mixture model for unsupervised learning
Neural Computation
On the algorithmic implementation of multiclass kernel-based vector machines
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Non-negative Matrix Factorization with Sparseness Constraints
The Journal of Machine Learning Research
On smoothing and inference for topic models
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Automatic evaluation of topic coherence
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Comparing twitter and traditional media using topic models
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Regularized latent semantic indexing
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Communications of the ACM
The contextual focused topic model
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving LDA topic models for microblogs via tweet pooling and automatic labeling
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
One theme in all views: modeling consensus topics in multiple contexts
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a "Spike and Slab" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model outperforms both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents.