Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
The Journal of Machine Learning Research
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised prediction of citation influences
Proceedings of the 24th international conference on Machine learning
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Why we twitter: understanding microblogging usage and communities
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Topic modeling with network regularization
Proceedings of the 17th international conference on World Wide Web
Proceedings of the first workshop on Online social networks
Text Mining: Classification, Clustering, and Applications
Text Mining: Classification, Clustering, and Applications
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications
AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
HTM: a topic model for hypertexts
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Distributed Algorithms for Topic Models
The Journal of Machine Learning Research
An architecture for parallel topic models
Proceedings of the VLDB Endowment
Analyzing microblogs with affinity propagation
Proceedings of the First Workshop on Social Media Analytics
Expectation-propagation for the generative aspect model
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Information propagation in microblog networks
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
In the information explosion era, large scale data processing and mining is a hot issue. As microblog grows more popular, microblog services have become information provider on a web scale, so researches on microblog begin to focus more on its content mining than solely user's relationship analysis before. Although traditional text mining methods have been studied well, no algorithm is designed specially for microblog data, which contain structured information on social network besides plain text. In this paper, we introduce a novel probabilistic generative model MicroBlog-Latent Dirichlet Allocation (MB-LDA), which takes both contactor relevance relation and document relevance relation into consideration to improve topic mining in microblogs. Through Gibbs sampling for approximate inference of our model, MB-LDA can discover not only the topics of microblogs, but also the topics focused by contactors. When faced with large datasets, traditional techniques on single node become less practical within limited resources. So we present distributed MB-LDA in MapReduce framework in order to process large scale microblogs with high scalability. Furthermore, we apply a performance model to optimize the execution time by tuning the number of mappers and reducers. Experimental results on actual dataset show MB-LDA outperforms the baseline of LDA and distributed MB-LDA offers an effective solution to topic mining for large scale microblogs.