Topic extraction based on prior knowledge obtained from target documents

  • Authors:
  • Kayo Tatsukawa;Ichiro Kobayashi

  • Affiliations:
  • Ochanomizu University, Ohtsuka Bunkyo-ku Tokyo, JAPAN;Ochanomizu University, Ohtsuka Bunkyo-ku Tokyo, JAPAN

  • Venue:
  • ACL '12 Proceedings of ACL 2012 Student Research Workshop
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the relation between prior knowledge and latent topic classification. There are many cases where the topic classification done by Latent Dirichlet Allocation results in the different classification that humans expect. To improve this problem, several studies using Dirichlet Forest prior instead of Dirichlet distribution have been studied in order to provide constraints on words so as they are classified into the same or not the same topics. However, in many cases, the prior knowledge is constructed from a subjective view of humans, but is not constructed based on the properties of target documents. In this study, we construct prior knowledge based on the words extracted from target documents and provide it as constraints for topic classification. We discuss the result of topic classification with the constraints.