A unified generative model for characterizing microblogs' topics

Authors:
Kun Zhuang;Heyan Huang;Xin Xin;Xiaochi Wei;Xianxiang Yang;Chong Feng;Ying Fang
Affiliations:
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 8
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Why we twitter: understanding microblogging usage and communities

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Patterns of temporal variation in online media

Proceedings of the fourth ACM international conference on Web search and data mining
The Joint Inference of Topic Diffusion and Evolution in Social Communities

ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Discovering geographical topics in the twitter stream

Proceedings of the 21st international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we focus on the issue of characterizing microblogs' topics based on topic models. Different from dealing with traditional textual media (such as news documents), modeling microblogs has three challenges: 1) too much noise; 2) short text; and 3) content incompleteness. Previously, all these limitations have been investigated separately. Some work filters the noise through a prior classification; some enhances the text through the user's blog history; and some utilizes the social network. However, none of these work could solve all the above limitations simultaneously. To solve this problem, we make a combination of previous work in this paper, and propose a unified generative model for characterizing microblogs' topics. In the proposed unified approach, all the three limitations could be solved. A collapsed Gibbs-sampling optimization method is derived for estimating the parameters. Through both qualitative and quantitative analysis in Twitter, we demonstrate that our approach consistently outperforms previous methods at a significant scale.