The Journal of Machine Learning Research
Efficient elastic burst detection in data streams
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Unified Framework for Monitoring Data Streams in Real Time
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
ICML '06 Proceedings of the 23rd international conference on Machine learning
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
Predicting the popularity of online content
Communications of the ACM
PET: a statistical model for popular events tracking in social communities
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting the Future with Social Media
WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Patterns of temporal variation in online media
Proceedings of the fourth ACM international conference on Web search and data mining
Trend analysis model: trend consists of temporal words, topics, and timestamps
Proceedings of the fourth ACM international conference on Web search and data mining
Analyzing user modeling on twitter for personalized news recommendations
UMAP'11 Proceedings of the 19th international conference on User modeling, adaption, and personalization
Emerging topic detection using dictionary learning
Proceedings of the 20th ACM international conference on Information and knowledge management
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
The Joint Inference of Topic Diffusion and Evolution in Social Communities
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Proceedings of the fifth ACM international conference on Web search and data mining
Emerging topic detection for organizations from microblogs
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
An unsupervised topic segmentation model incorporating word order
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A biterm topic model for short texts
Proceedings of the 22nd international conference on World Wide Web
A time-based collective factorization for topic discovery and monitoring in news
Proceedings of the 23rd international conference on World wide web
Tag recommendation for open source software
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.00 |
Latent topic analysis has emerged as one of the most effective methods for classifying, clustering and retrieving textual data. However, existing models such as Latent Dirichlet Allocation (LDA) were developed for static corpora of relatively large documents. In contrast, much of the textual content on the web, and especially social media, is temporally sequenced, and comes in short fragments, including microblog posts on sites such as Twitter and Weibo, status updates on social networking sites such as Facebook and LinkedIn, or comments on content sharing sites such as YouTube. In this paper we propose a novel topic model, Temporal-LDA or TM-LDA, for efficiently mining text streams such as a sequence of posts from the same author, by modeling the topic transitions that naturally arise in these data. TM-LDA learns the transition parameters among topics by minimizing the prediction error on topic distribution in subsequent postings. After training, TM-LDA is thus able to accurately predict the expected topic distribution in future posts. To make these predictions more efficient for a realistic online setting, we develop an efficient updating algorithm to adjust the topic transition parameters, as new documents stream in. Our empirical results, over a corpus of over 30 million microblog posts, show that TM-LDA significantly outperforms state-of-the-art static LDA models for estimating the topic distribution of new documents over time. We also demonstrate that TM-LDA is able to highlight interesting variations of common topic transitions, such as the differences in the work-life rhythm of cities, and factors associated with area-specific problems and complaints.