Unsupervised part of speech tagging using unambiguous substitutes from a statistical language model

Authors:
Mehmet Ali Yatbaz;Deniz Yuret
Affiliations:
Koç University;Koç University
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 8
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Part of speech tagging in context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
KU: word sense disambiguation by substitution

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
USYD: WSD and lexical substitution using the Web1T corpus

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Minimized models for unsupervised part-of-speech tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
The noisy channel model for unsupervised word sense disambiguation

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show that unsupervised part of speech tagging performance can be significantly improved using likely substitutes for target words given by a statistical language model. We choose unambiguous substitutes for each occurrence of an ambiguous target word based on its context. The part of speech tags for the unambiguous substitutes are then used to filter the entry for the target word in the word--tag dictionary. A standard HMM model trained using the filtered dictionary achieves 92.25% accuracy on a standard 24,000 word corpus.