Distributional representations for handling sparsity in supervised sequence-labeling

Authors:
Fei Huang;Alexander Yates
Affiliations:
Temple University, Wachman Hall;Temple University, Wachman Hall
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 14
Cited 17

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A neural probabilistic language model

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative hidden Markov modeling with long state dependence using a kNN ensemble

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Part of speech tagging in context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Three new graphical models for statistical language modelling

Proceedings of the 24th international conference on Machine learning
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research

Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Open-domain semantic role labeling by modeling word spans

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Exploring representation-learning approaches to domain adaptation

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Efficient graph-based semi-supervised learning of structured tagging models

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
The necessity of combining adaptation methods

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Domain adaptation by constraining inter-domain variability of latent feature representation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language models as representations for weakly-supervised NLP tasks

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Adapting text instead of the model: an open domain approach

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Structured learning for semantic role labeling

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Passage retrieval for incorporating global evidence in sequence labeling

Proceedings of the 20th ACM international conference on Information and knowledge management
Natural Language Processing (Almost) from Scratch

The Journal of Machine Learning Research
Graph-based lexicon expansion with sparsity-inducing penalties

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Scoring spoken responses based on content accuracy

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Linking named entities to any database

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Biased representation learning for domain adaptation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unsupervised feature adaptation for cross-domain NLP with an application to compositionality grading

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Representing objects, relations, and sequences

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised sequence-labeling systems in natural language processing often suffer from data sparsity because they use word types as features in their prediction tasks. Consequently, they have difficulty estimating parameters for types which appear in the test set, but seldom (or never) appear in the training set. We demonstrate that distributional representations of word types, trained on unannotated text, can be used to improve performance on rare words. We incorporate aspects of these representations into the feature space of our sequence-labeling systems. In an experiment on a standard chunking dataset, our best technique improves a chunker from 0.76 F1 to 0.86 F1 on chunks beginning with rare words. On the same dataset, it improves our part-of-speech tagger from 74% to 80% accuracy on rare words. Furthermore, our system improves significantly over a baseline system when applied to text from a different domain, and it reduces the sample complexity of sequence labeling.