Unsupervised part-of-speech tagging employing efficient graph clustering

  • Authors:
  • Chris Biemann

  • Affiliations:
  • University of Leipzig, Leipzig, Germany

  • Venue:
  • COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

An unsupervised part-of-speech (POS) tagging system that relies on graph clustering methods is described. Unlike in current state-of-the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one based on context similarity of high frequency words, another on log-likelihood statistics for words of lower frequencies. Using the resulting word clusters as a lexicon, a Viterbi POS tagger is trained, which is refined by a morphological component. The approach is evaluated on three different languages by measuring agreement with existing taggers.