Language models as representations for weakly-supervised NLP tasks

Authors:
Fei Huang;Alexander Yates;Arun Ahuja;Doug Downey
Affiliations:
Temple University, Philadelphia, PA;Temple University, Philadelphia, PA;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL
Venue:
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Year:
2011

Citing 36
Cited 2

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Class-based n-gram models of natural language

Computational Linguistics
Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Algorithms for bigram and trigram word clustering

Speech Communication
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Dynamic Programming on Graphs with Bounded Treewidth

ICALP '88 Proceedings of the 15th International Colloquium on Automata, Languages and Programming
Latent dirichlet allocation

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Intricacies of Collins' Parsing Model

Computational Linguistics
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

The Journal of Machine Learning Research
A unified architecture for natural language processing: deep neural networks with multitask learning

Proceedings of the 25th international conference on Machine learning
Deep learning via semi-supervised embedding

Proceedings of the 25th international conference on Machine learning
Curriculum learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Multilingual dependency learning: a huge feature engineering method to semantic dependency parsing

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Online methods for multi-domain learning and adaptation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hierarchical Bayesian domain adaptation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
Locating complex named entities in web text

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Jointly labeling multiple sequences: a factorial HMM approach

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Distributional representations for handling sparsity in supervised sequence-labeling

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Phrase clustering for discriminative learning

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Improving generative statistical parsing with semi-supervised word clustering

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Multi-domain learning by confidence-weighted parameter combination

Machine Learning
A theory of learning from different domains

Machine Learning
Improved extraction assessment through better language models

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Open-domain semantic role labeling by modeling word spans

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Exploring representation-learning approaches to domain adaptation

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Frustratingly easy semi-supervised domain adaptation

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

Automatic grading of scientific inquiry

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Biased representation learning for domain adaptation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding the right representation for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This paper investigates language model representations, in which language models trained on unlabeled corpora are used to generate real-valued feature vectors for words. We investigate ngram models and probabilistic graphical models, including a novel lattice-structured Markov Random Field. Experiments indicate that language model representations outperform traditional representations, and that graphical model representations outperform ngram models, especially on sparse and polysemous words.