Unsupervised Russian POS tagging with appropriate context

Authors:
Li Yang;Erik Peterson;John Chen;Yana Petrova;Rohini Srihari
Affiliations:
Janya Inc., Amherst, NY;Janya Inc., Amherst, NY;Janya Inc., Amherst, NY;Department of Linguistics, State University of New York at Buffalo, Buffalo, NY;Janya Inc., Amherst, NY
Venue:
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Year:
2011

Citing 9
Cited 0

Building probabilistic models for natural language

Building probabilistic models for natural language
Tagging English text with a probabilistic model

Computational Linguistics
A second-order Hidden Markov Model for part-of-speech tagging

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Part of speech tagging in context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Minimized models for unsupervised part-of-speech tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Painless unsupervised learning with features

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved unsupervised POS induction through prototype discovery

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Latent-descriptor clustering for unsupervised POS induction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

While adopting the contextualized hidden Markov model (CHMM) framework for unsupervised Russian POS tagging, we investigate the possibility of utilizing the left, right, and unambiguous context in the CHMM framework. We propose a backoff smoothing method that incorporates all three types of context into the transition probability estimation during the expectation-maximization process. The resulting model with this new method achieves overall and disambiguation accuracies comparable to a CHMM using the classic backoff smoothing method for HMM-based POS tagging from [17].