Unsupervised part-of-speech disambiguation for high frequency words and its influence on unsupervised parsing

  • Authors:
  • Christian Hänig

  • Affiliations:
  • Natural Language Processing Group, Department of Computer Science, University of Leipzig, Leipzig, Germany

  • Venue:
  • CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current unsupervised part-of-speech tagging algorithms build context vectors containing high frequency words as features and cluster words – regarding to their context vectors – into classes. While part-of-speech disambiguation for mid and low frequency words is achieved by applying a Hidden Markov Model, no corresponding method is applied to high frequency terms. But those are exactly the words being essential for analyzing syntactic dependencies of natural language. Thus, we want to introduce an approach employing unsupervised clustering of contexts to detect and separate a word's different syntactic roles. Experiments on German and English corpora show how this methodology addresses and solves some of the major problems of unsupervised part-of-speech tagging.