Knowledge Discovery from Citation Networks

  • Authors:
  • Zhen Guo;Zhongfei Zhang;Shenghuo Zhu;Yun Chi;Yihong Gong

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Knowledge discovery from scientific articles has received increasing attentions recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: \emph{document itself} and \emph{a citation of other documents}. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper we propose a \emph{Bernoulli Process Topic}~(BPT) model which models the corpus at two levels: \emph{document level} and \emph{citation level}. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multi-level hierarchical structure of the citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. In addition to conducting the experimental evaluations on the document modeling task, we also apply the BPT model to a well known scientific corpus to discover the latent topics. The comparisons against state-of-the-art methods demonstrate a very promising performance.