Understanding evolution of research themes: a probabilistic generative model for citations

  • Authors:
  • Xiaolong Wang;Chengxiang Zhai;Dan Roth

  • Affiliations:
  • University of Illinois, Urbana-Champaign, Urbana, IL, USA;University of Illinois, Urbana-Champaign, Urbana, IL, USA;University of Illinois, Urbana-Champaign, Urbana, IL, USA

  • Venue:
  • Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Understanding how research themes evolve over time in a research community is useful in many ways (e.g., revealing important milestones and discovering emerging major research trends). In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model. The key idea is to represent a research paper by a ``bag of citations'' and model such a ``citation document'' with a probabilistic topic model. We explore the extension of a particular topic model, i.e., Latent Dirichlet Allocation~(LDA), for citation analysis, and show that such a Citation-LDA can facilitate discovering of individual research topics as well as the theme evolution from multiple related topics, both of which in turn lead to the construction of evolution graphs for characterizing research themes. We test the proposed citation-LDA on two datasets: the ACL Anthology Network(AAN) of natural language research literatures and PubMed Central(PMC) archive of biomedical and life sciences literatures, and demonstrate that Citation-LDA can effectively discover the evolution of research themes, with better formed topics than (conventional) Content-LDA.