Class-based n-gram models of natural language
Computational Linguistics
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model
Computational Linguistics
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Contrastive estimation: training log-linear models on unlabeled data
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Prototype-driven learning for sequence models
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Unsupervised part-of-speech tagging employing efficient graph clustering
COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Minimized models for unsupervised part-of-speech tagging
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
The infinite HMM for unsupervised PoS tagging
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Unsupervised and constrained Dirichlet process mixture models for verb clustering
GEMS '09 Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
Multilingual part-of-speech tagging: two unsupervised approaches
Journal of Artificial Intelligence Research
Painless unsupervised learning with features
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised part-of-speech tagging with bilingual graph-based projections
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Controlling complexity in part-of-speech induction
Journal of Artificial Intelligence Research
Evaluating unsupervised learning for natural language processing tasks
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
A Bayesian mixture model for part-of-speech induction using multiple features
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised dependency parsing without gold part-of-speech tags
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised semantic role induction with graph partitioning
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A hierarchical dirichlet process model for joint part-of-speech and morphology induction
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
The PASCAL Challenge on Grammar Induction
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Type-supervised hidden Markov models for part-of-speech tagging with incomplete tag dictionaries
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Learning syntactic categories using paradigmatic representations of word context
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A lazy man's way to part-of-speech tagging
PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Hi-index | 0.00 |
Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abilities by testing on both WSJ and the multilingual Multext-East corpus. Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. In most cases, the prototype-driven learner outperforms the unsupervised system used to initialize it, yielding state-of-the-art results on WSJ and improvements on non-English corpora.