Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

  • Authors:
  • Tomonari Masada;Atsuhiro Takasu;Tsuyoshi Hamada;Yuichiro Shibata;Kiyoshi Oguri

  • Affiliations:
  • Nagasaki University, Nagasaki, Japan;National Institute of Informatics, Tokyo, Japan;Nagasaki University, Nagasaki, Japan;Nagasaki University, Nagasaki, Japan;Nagasaki University, Nagasaki, Japan

  • Venue:
  • APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, we propose a new probabilistic model, Bag of Timestamps (BoT) , for chronological text mining. BoT is an extension of latent Dirichlet allocation (LDA), and has two remarkable features when compared with a previously proposed Topics over Time (ToT) , which is also an extension of LDA. First, we can avoid overfitting to temporal data, because temporal data are modeled in a Bayesian manner similar to word frequencies. Second, BoT has a conditional probability where no functions requiring time-consuming computations appear. The experiments using newswire documents show that BoT achieves more moderate fitting to temporal data in shorter execution time than ToT.