Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
The Journal of Machine Learning Research
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 17th international conference on World Wide Web
Modeling hidden topics on document manifold
Proceedings of the 17th ACM conference on Information and knowledge management
Named entity recognition in query
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Using twitter to recommend real-time topical news
Proceedings of the third ACM conference on Recommender systems
TwitterRank: finding topic-sensitive influential twitterers
Proceedings of the third ACM international conference on Web search and data mining
Short and tweet: experiments on recommending content from information streams
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On smoothing and inference for topic models
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Query similarity by projecting the query-flow graph
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
PET: a statistical model for popular events tracking in social communities
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical study of topic modeling in Twitter
Proceedings of the First Workshop on Social Media Analytics
Comparing twitter and traditional media using topic models
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Transferring topical knowledge from auxiliary long texts for short text clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Optimizing semantic coherence in topic models
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
TM-LDA: efficient online modeling of latent topic transitions in social media
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering short text using Ncut-weighted non-negative matrix factorization
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Uncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. However, directly applying conventional topic models (e.g. LDA and PLSA) on such short texts may not work well. The fundamental reason lies in that conventional topic models implicitly capture the document-level word co-occurrence patterns to reveal topics, and thus suffer from the severe data sparsity in short documents. In this paper, we propose a novel way for modeling topics in short texts, referred as biterm topic model (BTM). Specifically, in BTM we learn the topics by directly modeling the generation of word co-occurrence patterns (i.e. biterms) in the whole corpus. The major advantages of BTM are that 1) BTM explicitly models the word co-occurrence patterns to enhance the topic learning; and 2) BTM uses the aggregated patterns in the whole corpus for learning topics to solve the problem of sparse word co-occurrence patterns at document-level. We carry out extensive experiments on real-world short text collections. The results demonstrate that our approach can discover more prominent and coherent topics, and significantly outperform baseline methods on several evaluation metrics. Furthermore, we find that BTM can outperform LDA even on normal texts, showing the potential generality and wider usage of the new topic model.