Supervised N-gram topic model

  • Authors:
  • Noriaki Kawamae

  • Affiliations:
  • Tokyo Denki University, Tokyo, Japan

  • Venue:
  • Proceedings of the 7th ACM international conference on Web search and data mining
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a Bayesian nonparametric topic model that rep- resents relationships between given labels and the corre- sponding words/phrases, from supervised articles. Unlike existing supervised topic models, our proposal, supervised N-gram topic model (SNT), focuses on both a number of topics and power-law distribution in the word frequencies to extract topic specific N-grams. To achieve this goal, SNT takes a Bayesian nonparametric approach to topic sampling, which generates word distribution jointly with the given variable in textual order, and then form each N-gram word as a hierarchy of Pitman-Yor process priors. Experiments on labeled text data show that SNT is useful as a generative model for discovering more phrases that complement human experts and domain specific knowledge than the existing al- ternatives. The results show that SNT can be applied to various tasks such as automatic annotation.