Type-based MCMC

Authors:
Percy Liang;Michael I. Jordan;Dan Klein
Affiliations:
UC Berkeley;UC Berkeley;UC Berkeley
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 10
Cited 11

Class-based n-gram models of natural language

Computational Linguistics
Inducing Probabilistic Grammars by Bayesian Model Merging

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Contextual dependencies in unsupervised word segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A permutation-augmented sampler for DP mixture models

Proceedings of the 24th international conference on Machine learning
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient parsing for transducer grammars

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Inducing compact but accurate tree-substitution grammars

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bayesian learning of a tree substitution grammar

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Inducing Tree-Substitution Grammars

The Journal of Machine Learning Research
Bayesian inference for Zodiac and other homophonic ciphers

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Modeling syntactic context improves morphological segmentation

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Structured databases of named entities from Bayesian nonparametrics

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Non-parametric bayesian segmentation of Japanese noun phrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multiword expression identification with tree substitution grammars: a parsing tour de force with French

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Judging grammaticality with count-induced tree substitution grammars

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Unsupervised part of speech inference with particle filters

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Unsupervised bayesian part of speech inference with particle gibbs

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Parsing models for identifying multiword expressions

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most existing algorithms for learning latent-variable models---such as EM and existing Gibbs samplers---are token-based, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper, we introduce a type-based sampler, which updates a block of variables, identified by a type, which spans multiple sentences. We show improvements on part-of-speech induction, word segmentation, and learning tree-substitution grammars.