A hierarchical dirichlet process model for joint part-of-speech and morphology induction

  • Authors:
  • Kairit Sirts;Tanel Alumäe

  • Affiliations:
  • Tallinn University of Technology;Tallinn University of Technology

  • Venue:
  • NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a fully unsupervised nonparametric Bayesian model that jointly induces POS tags and morphological segmentations. The model is essentially an infinite HMM that infers the number of states from data. Incorporating segmentation into the same model provides the morphological features to the system and eliminates the need to find them during preprocessing step. We show that learning both tasks jointly actually leads to better results than learning either task with gold standard data from the other task provided. The evaluation on multilingual data shows that the model produces state-of-the-art results on POS induction.