Sequential Modeling of Topic Dynamics with Multiple Timescales

  • Authors:
  • Tomoharu Iwata;Takeshi Yamada;Yasushi Sakurai;Naonori Ueda

  • Affiliations:
  • NTT Communication Science Laboratories;NTT Science and Core Technology Laboratory Group;NTT Communication Science Laboratories;NTT Communication Science Laboratories

  • Venue:
  • ACM Transactions on Knowledge Discovery from Data (TKDD)
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, some words may be used consistently over one hundred years, while other words emerge and disappear over periods of a few days. Thus, in the proposed model, current topic-specific distributions over words are assumed to be generated based on the multiscale word distributions of the previous epoch. Considering both the long- and short-timescale dependency yields a more robust model. We derive efficient online inference procedures based on a stochastic EM algorithm, in which the model is sequentially updated using newly obtained data; this means that past data are not required to make the inference. We demonstrate the effectiveness of the proposed method in terms of predictive performance and computational efficiency by examining collections of real documents with timestamps.