A simple unsupervised learner for POS disambiguation rules given only a minimal lexicon

Authors:
Qiuye Zhao;Mitch Marcus
Affiliations:
University of Pennsylvania;University of Pennsylvania
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 8
Cited 5

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Part-of-speech induction from scratch

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Part of speech tagging in context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Novel estimation methods for unsupervised discovery of latent structure in natural language text

Novel estimation methods for unsupervised discovery of latent structure in natural language text
Structures and distributions in morphology learning

Structures and distributions in morphology learning

Improved unsupervised POS induction through prototype discovery

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Crouching Dirichlet, hidden Markov model: unsupervised POS tagging with context local tag generation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Controlling complexity in part-of-speech induction

Journal of Artificial Intelligence Research
A cost sensitive part-of-speech tagging: differentiating serious errors from minor errors

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new model for unsupervised POS tagging based on linguistic distinctions between open and closed-class items. Exploiting notions from current linguistic theory, the system uses far less information than previous systems, far simpler computational methods, and far sparser descriptions in learning contexts. By applying simple language acquisition techniques based on counting, the system is given the closed-class lexicon, acquires a large open-class lexicon and then acquires disambiguation rules for both. This system achieves a 20% error reduction for POS tagging over state-of-the-art unsupervised systems tested under the same conditions, and achieves comparable accuracy when trained with much less prior information.