Crouching Dirichlet, hidden Markov model: unsupervised POS tagging with context local tag generation

  • Authors:
  • Taesun Moon;Katrin Erk;Jason Baldridge

  • Affiliations:
  • University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX

  • Venue:
  • EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for part-of-speech tagging which draws state prior distributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural language between content and function words. In contrast, a standard HMM draws all prior distributions once over all states and it is known to perform poorly in unsupervised and semi-supervised POS tagging. This modification significantly improves unsupervised POS tagging performance across several measures on five data sets for four languages. We also show that simply using different hyperparameter values for content and function word states in a standard HMM (which we call HMM+) is surprisingly effective.