Unsupervised part-of-speech disambiguation for high frequency words and its influence on unsupervised parsing

Authors:
Christian Hänig
Affiliations:
Natural Language Processing Group, Department of Computer Science, University of Leipzig, Leipzig, Germany
Venue:
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2010

Citing 6
Cited 0

TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Toward unsupervised whole-corpus tagging

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Unsupervised part-of-speech tagging employing efficient graph clustering

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current unsupervised part-of-speech tagging algorithms build context vectors containing high frequency words as features and cluster words – regarding to their context vectors – into classes. While part-of-speech disambiguation for mid and low frequency words is achieved by applying a Hidden Markov Model, no corresponding method is applied to high frequency terms. But those are exactly the words being essential for analyzing syntactic dependencies of natural language. Thus, we want to introduce an approach employing unsupervised clustering of contexts to detect and separate a word's different syntactic roles. Experiments on German and English corpora show how this methodology addresses and solves some of the major problems of unsupervised part-of-speech tagging.