Unsupervised multilingual learning for POS tagging

  • Authors:
  • Benjamin Snyder;Tahira Naseem;Jacob Eisenstein;Regina Barzilay

  • Affiliations:
  • Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA

  • Venue:
  • EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The key hypothesis of multilingual learning is that by combining cues from multiple languages, the structure of each becomes more apparent. We formulate a hierarchical Bayesian model for jointly predicting bilingual streams of part-of-speech tags. The model learns language-specific features while capturing cross-lingual patterns in tag distribution for aligned words. Once the parameters of our model have been learned on bilingual parallel data, we evaluate its performance on a held-out monolingual test set. Our evaluation on six pairs of languages shows consistent and significant performance gains over a state-of-the-art monolingual baseline. For one language pair, we observe a relative reduction in error of 53%.