Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model
Computational Linguistics
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Part of speech tagging in context
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
Computational Linguistics
Weakly supervised part-of-speech tagging for morphologically-rich, resource-scarce languages
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Minimized models for unsupervised part-of-speech tagging
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Efficient optimization of an MDL-inspired objective function for unsupervised part-of-speech tagging
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Crouching Dirichlet, hidden Markov model: unsupervised POS tagging with context local tag generation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Two decades of unsupervised POS induction: how far have we come?
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Fast, greedy model minimization for unsupervised tagging
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Part-of-speech tagging from 97% to 100%: is it time for some linguistics?
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Unsupervised part-of-speech tagging with bilingual graph-based projections
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Part-of-speech tagging for Twitter: annotation, features, and experiments
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Hi-index | 0.00 |
Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the min-greedy algorithm (Ravi et al., 2010) as a starting point, we improve it with several intuitive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to performance over the original min-greedy algorithm for both English and Italian data.