Annealing techniques for unsupervised statistical language learning

Authors:
Noah A. Smith;Jason Eisner
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 12
Cited 17

A statistical approach to machine translation

Computational Linguistics
Elements of information theory

Elements of information theory
Deterministic annealing EM algorithm

Neural Networks
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Statistical Language Learning

Statistical Language Learning
Tagging English text with a probabilistic model

Computational Linguistics
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimally supervised induction of grammatical gender

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Dyna: a declarative language for implementing dynamic programs

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions

Learning Hidden Variable Networks: The Information Bottleneck Approach

The Journal of Machine Learning Research
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Annealing structural bias in multilingual weighted grammar induction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Compiling Comp Ling: practical weighted dynamic programming and the Dyna language

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Minimum risk annealing for training log-linear models

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Data-driven dependency parsing of new languages using incomplete and noisy training data

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Evaluating unsupervised part-of-speech tagging for grammar induction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
From baby steps to Leapfrog: how "Less is More" in unsupervised dependency parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Collocation extraction beyond the independence assumption

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Type level clustering evaluation: new measures and a POS induction case study

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Simple unsupervised grammar induction from raw text with cascaded finite state models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Lateen EM: unsupervised training with multiple objectives, applied to dependency grammar induction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Computational models of language acquisition

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Unified expectation maximization

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Unambiguity regularization for unsupervised learning of probabilistic grammars

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Smoothing for bracketing induction

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Exploiting unannotated natural language data is hard largely because unsupervised parameter estimation is hard. We describe deterministic annealing (Rose et al., 1990) as an appealing alternative to the Expectation-Maximization algorithm (Dempster et al., 1977). Seeking to avoid search error, DA begins by globally maximizing an easy concave function and maintains a local maximum as it gradually morphs the function into the desired non-concave likelihood function. Applying DA to parsing and tagging models is shown to be straightforward; significant improvements over EM are shown on a part-of-speech tagging task. We describe a variant, skewed DA, which can incorporate a good initializer when it is available, and show significant improvements over EM on a grammar induction task.