Online multiscale dynamic topic models

  • Authors:
  • Tomoharu Iwata;Takeshi Yamada;Yasushi Sakurai;Naonori Ueda

  • Affiliations:
  • NTT, Kyoto, Japan;NTT, Kyoto, Japan;NTT, Kyoto, Japan;NTT, Kyoto, Japan

  • Venue:
  • Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, some words may be used consistently over one hundred years, while other words emerge and disappear over periods of a few days. Thus, in the proposed model, current topic-specific distributions over words are assumed to be generated based on the multiscale word distributions of the previous epoch. Considering both the long-timescale dependency as well as the short-timescale dependency yields a more robust model. We derive efficient online inference procedures based on a stochastic EM algorithm, in which the model is sequentially updated using newly obtained data; this means that past data are not required to make the inference. We demonstrate the effectiveness of the proposed method in terms of predictive performance and computational efficiency by examining collections of real documents with timestamps.