Contrastive estimation: training log-linear models on unlabeled data

Authors:
Noah A. Smith;Jason Eisner
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Year:
2005

Citing 22
Cited 73

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
MMIE training of large vocabulary recognition systems

Speech Communication
A Unified Approach to Path Problems

Journal of the ACM (JACM)
Maximum conditional likelihood via bound maximization and the CEM algorithm

Proceedings of the 1998 conference on Advances in neural information processing systems II
Statistical Language Learning

Statistical Language Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Tagging English text with a probabilistic model

Computational Linguistics
Stochastic attribute-value grammars

Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Joint and conditional estimation of tagging and parsing models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Parameter estimation for probabilistic finite-state transducers

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Conditional structure versus conditional estimation in NLP models

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Investigating loss functions and optimization methods for discriminative learning of label sequences

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Annealing techniques for unsupervised statistical language learning

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Maximum entropy estimation for feature forests

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Annealing structural bias in multilingual weighted grammar induction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Compiling Comp Ling: practical weighted dynamic programming and the Dyna language

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Part-of-speech tagging using virtual evidence and negative training

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Probabilistic Context-Free Grammars Estimated from Infinite Distributions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Simple, robust, scalable semi-supervised learning via expectation regularization

Proceedings of the 24th international conference on Machine learning
Optimal reverse prediction: a unified perspective on supervised, unsupervised and semi-supervised learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Adapting a WSJ-trained parser to grammatically noisy text

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Categorizing local contexts as a step in grammatical category induction

CACLA '09 Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition
Learning auxiliary fronting with grammatical inference

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Data-driven dependency parsing of new languages using incomplete and noisy training data

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Evaluating unsupervised part-of-speech tagging for grammar induction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Weakly supervised part-of-speech tagging for morphologically-rich, resource-scarce languages

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
GenERRate: generating errors for use in grammatical error detection

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Refining generative language models using discriminative learning

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Discriminative learning of selectional preference from unlabeled text

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised multilingual learning for POS tagging

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised morphological segmentation with log-linear models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Computational challenges in parsing by classification

CHSLP '06 Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Corrective modeling for non-projective dependency parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Semi-supervised learning of dependency parsers using generalized expectation criteria

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Distributional representations for handling sparsity in supervised sequence-labeling

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Minimized models for unsupervised part-of-speech tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Joint decoding with multiple translation models

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
A simple unsupervised learner for POS disambiguation rules given only a minimal lexicon

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Multilingual part-of-speech tagging: two unsupervised approaches

Journal of Artificial Intelligence Research
Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

The Journal of Machine Learning Research
Painless unsupervised learning with features

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Global learning of focused entailment graphs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improved unsupervised POS induction through prototype discovery

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Efficient optimization of an MDL-inspired objective function for unsupervised part-of-speech tagging

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
SVD and clustering for unsupervised POS tagging

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Exploring representation-learning approaches to domain adaptation

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Improved natural language learning via variance-regularization support vector machines

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Crouching Dirichlet, hidden Markov model: unsupervised POS tagging with context local tag generation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Two decades of unsupervised POS induction: how far have we come?

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Latent-descriptor clustering for unsupervised POS induction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Simple type-level unsupervised POS tagging

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Fast, greedy model minimization for unsupervised tagging

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Modeling relations and their mentions without labeled text

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Automatic part of speech tagging for Arabic: an experiment using Bigram hidden Markov model

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Unsupervised discriminative language model training for machine translation using simulated confusion sets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unsupervised part of speech tagging using unambiguous substitutes from a statistical language model

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unsupervised word alignment with arbitrary features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language models as representations for weakly-supervised NLP tasks

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
From ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing

TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
Conditional graphical models for protein structure prediction

Conditional graphical models for protein structure prediction
Controlling complexity in part-of-speech induction

Journal of Artificial Intelligence Research
Acoustically discriminative language model training with pseudo-hypothesis

Speech Communication
Unsupervised multilingual learning

Unsupervised multilingual learning
Natural Language Processing (Almost) from Scratch

The Journal of Machine Learning Research
Learning entailment relations by global graph structure optimization

Computational Linguistics
Unsupervised bilingual POS tagging with Markov random fields

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Classification-based contextual preferences

TIWTE '11 Proceedings of the TextInfer 2011 Workshop on Textual Entailment
Unsupervised structure prediction with non-parallel multilingual guidance

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multi-source transfer of delexicalized dependency parsers

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A Bayesian mixture model for part-of-speech induction using multiple features

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast generation of translation forest for large-scale SMT discriminative training

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lateen EM: unsupervised training with multiple objectives, applied to dependency grammar induction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Complementary Kernel Density Estimation

Pattern Recognition Letters
The latent words language model

Computer Speech and Language
Unsupervised learning on an approximate corpus

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Clustered word classes for preordering in statistical machine translation

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Unsupervised part-of-speech tagging in noisy and esoteric domains with a syntactic-semantic Bayesian HMM

Proceedings of the Workshop on Semantic Analysis in Social Media
Exploiting partial annotations with EM training

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
A probabilistic model for canonicalizing named entity mentions

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Automatic event extraction with structured preference modeling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Wiki-ly supervised part-of-speech tagging

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unsupervised bayesian part of speech inference with particle gibbs

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging

Language Resources and Evaluation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and named-entity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for log-linear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem---POS tagging given a tagging dictionary and unlabeled text---contrastive estimation outperforms EM (with the same feature set), is more robust to degradations of the dictionary, and can largely recover by modeling additional features.