Extraction of topic evolutions from references in scientific articles and its GPU acceleration

Authors:
Tomonari Masada;Atsuhiro Takasu
Affiliations:
Nagasaki University, Nagasaki, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 9
Cited 0

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Latent dirichlet allocation

The Journal of Machine Learning Research
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
The dynamic hierarchical Dirichlet process

Proceedings of the 25th international conference on Machine learning
Joint latent topic models for text and citations

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
iTopicModel: Information Network-Integrated Topic Modeling

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce

Proceedings of the 21st international conference on World Wide Web
Historical analysis of legal opinions with a sparse mixed-effects latent variable model

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper provides a topic model for extracting topic evolutions as a corpus-wide transition matrix among latent topics. Recent trends in text mining point to a high demand for exploiting metadata. Especially, exploitation of reference relationships among documents induced by hyperlinking Web pages, citing scientific articles, tumblring blog posts, retweeting tweets, etc., is put in the foreground of the effort for an effective mining. We focus on scholarly activities and propose a topic model for obtaining a corpus-wide view on how research topics evolve along citation relationships. Our model, called TERESA, extends latent Dirichlet allocation (LDA) by introducing a corpus-wide topic transition probability matrix, which models reference relationships as transitions among topics. Our approximated variational inference updates LDA posteriors and topic transition posteriors alternately. The main issue is execution time amounting to O(MK2), where K is the number of topics and M is that of links in citation network. Therefore, we accelerate the inference with Nvidia CUDA compatible GPUs. We compare the effectiveness of TERESA with that of LDA by introducing a new measure called diversity plus focusedness (D+F). We also present topic evolution examples our method gives.