Automatic domain adaptation for parsing

Authors:
David McClosky;Eugene Charniak;Mark Johnson
Affiliations:
Stanford University, Stanford, CA and Brown University, Providence, RI;Brown University, Providence, RI;Macquarie University, Sydney, NSW, Australia and Brown University, Providence, RI
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 14
Cited 12

Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
The domain dependence of parsing

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Reranking and self-training for parser adaptation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Online methods for multi-domain learning and adaptation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic prediction of parser accuracy

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hierarchical Bayesian domain adaptation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
MAP adaptation of stochastic grammars

Computer Speech and Language
An empirical study of semi-supervised structured conditional models for dependency parsing

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

We're not in Kansas anymore: detecting domain changes in streams

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Disentangling chat with local coherence models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exploiting web-derived selectional preference to improve statistical dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Effective measures of domain similarity for parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Parsing natural language queries for life science knowledge

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Adapting text instead of the model: an open domain approach

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Cross-Domain Effects on Parse Selection for Precision Grammars

Research on Language and Computation
Sentence-level instance-weighting for graph-based and transition-based dependency parsing

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Minimally supervised domain-adaptive parse reranking for relation extraction

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Comparing the use of edited and unedited text in parser self-training

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Data point selection for self-training

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Improved parsing and POS tagging using inter-sentence consistency constraints

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current statistical parsers tend to perform well only on their training domain and nearby genres. While strong performance on a few related domains is sufficient for many situations, it is advantageous for parsers to be able to generalize to a wide variety of domains. When parsing document collections involving heterogeneous domains (e.g. the web), the optimal parsing model for each document is typically not obvious. We study this problem as a new task --- multiple source parser adaptation. Our system trains on corpora from many different domains. It learns not only statistics of those domains but quantitative measures of domain differences and how those differences affect parsing accuracy. Given a specific target text, the resulting system proposes linear combinations of parsing models trained on the source corpora. Tested across six domains, our system outperforms all non-oracle baselines including the best domain-independent parsing model. Thus, we are able to demonstrate the value of customizing parsing models to specific domains.