A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction

Authors:
Phil Blunsom;Trevor Cohn
Affiliations:
University of Oxford;University of Sheffield
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 18
Cited 6

Class-based n-gram models of natural language

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
An efficient method for determining bilingual word classes

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Contextual dependencies in unsupervised word segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hierarchical Bayesian language model based on Pitman-Yor processes

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Minimized models for unsupervised part-of-speech tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Type-based MCMC

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Painless unsupervised learning with features

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Posterior Regularization for Structured Latent Variable Models

The Journal of Machine Learning Research
Two decades of unsupervised POS induction: how far have we come?

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Simple type-level unsupervised POS tagging

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Inducing Tree-Substitution Grammars

The Journal of Machine Learning Research

A hierarchical dirichlet process model for joint part-of-speech and morphology induction

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Unsupervised part of speech inference with particle filters

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
The PASCAL Challenge on Grammar Induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
A bayesian model for learning SCFGs with discontiguous rules

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Learning syntactic categories using paradigmatic representations of word context

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unsupervised bayesian part of speech inference with particle gibbs

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, providing an elegant and principled means of incorporating lexical characteristics. Central to our approach is a new type-based sampling algorithm for hierarchical Pitman-Yor models in which we track fractional table counts. In an empirical evaluation we show that our model consistently out-performs the current state-of-the-art across 10 languages.