Tagging English text with a probabilistic model
Computational Linguistics
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Distributional part-of-speech tagging
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Inducing syntactic categories by context distribution clustering
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Contrastive estimation: training log-linear models on unlabeled data
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Prototype-driven learning for sequence models
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Unsupervised multilingual learning for POS tagging
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Simple type-level unsupervised POS tagging
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Adaptive Bayesian HMM for Fully Unsupervised Chinese Part-of-Speech Induction
ACM Transactions on Asian Language Information Processing (TALIP)
Type-supervised hidden Markov models for part-of-speech tagging with incomplete tag dictionaries
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.01 |
This paper examines unsupervised approaches to part-of-speech (POS) tagging for morphologically-rich, resource-scarce languages, with an emphasis on Goldwater and Griffiths's (2007) fully-Bayesian approach originally developed for English POS tagging. We argue that existing unsupervised POS taggers unrealistically assume as input a perfect POS lexicon, and consequently, we propose a weakly supervised fully-Bayesian approach to POS tagging, which relaxes the unrealistic assumption by automatically acquiring the lexicon from a small amount of POS-tagged data. Since such relaxation comes at the expense of a drop in tagging accuracy, we propose two extensions to the Bayesian framework and demonstrate that they are effective in improving a fully-Bayesian POS tagger for Bengali, our representative morphologically-rich, resource-scarce language.