On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
The Journal of Machine Learning Research
ICML '06 Proceedings of the 23rd international conference on Machine learning
The dynamic hierarchical Dirichlet process
Proceedings of the 25th international conference on Machine learning
Joint latent topic models for text and citations
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
iTopicModel: Information Network-Integrated Topic Modeling
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
Historical analysis of legal opinions with a sparse mixed-effects latent variable model
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
This paper provides a topic model for extracting topic evolutions as a corpus-wide transition matrix among latent topics. Recent trends in text mining point to a high demand for exploiting metadata. Especially, exploitation of reference relationships among documents induced by hyperlinking Web pages, citing scientific articles, tumblring blog posts, retweeting tweets, etc., is put in the foreground of the effort for an effective mining. We focus on scholarly activities and propose a topic model for obtaining a corpus-wide view on how research topics evolve along citation relationships. Our model, called TERESA, extends latent Dirichlet allocation (LDA) by introducing a corpus-wide topic transition probability matrix, which models reference relationships as transitions among topics. Our approximated variational inference updates LDA posteriors and topic transition posteriors alternately. The main issue is execution time amounting to O(MK2), where K is the number of topics and M is that of links in citation network. Therefore, we accelerate the inference with Nvidia CUDA compatible GPUs. We compare the effectiveness of TERESA with that of LDA by introducing a new measure called diversity plus focusedness (D+F). We also present topic evolution examples our method gives.