Adapting a lexicalized-grammar parser to contrasting domains

Authors:
Laura Rimell;Stephen Clark
Affiliations:
Oxford University, Oxford, UK;Oxford University, Oxford, UK
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 16
Cited 9

The syntactic process

The syntactic process
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Supertagging: an approach to almost parsing

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The role of lexico-semantic feedback in open-domain textual question-answering

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Probabilistic disambiguation models for wide-coverage HPSG parsing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
QuestionBank: creating a corpus of parse-annotated questions

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Multi-tagging for lexicalized-grammar parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The second release of the RASP system

COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models

Computational Linguistics
On the unification of syntactic annotations under the stanford dependency scheme: a case study on BioInfer and GENIA

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Extremely lexicalized models for accurate and fast HPSG parsing

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Evaluating impact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Unbounded dependency recovery for parser evaluation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Faster parsing by supertagger adaptation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Creating robust supervised classifiers via web-scale N-gram data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised parse selection for HPSG

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Reducing overdetections in a French symbolic grammar checker by classification

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Parsing natural language queries for life science knowledge

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Semi-supervised CCG lexicon extension

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Minimally supervised domain-adaptive parse reranking for relation extraction

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Improved parsing and POS tagging using inter-sentence consistency constraints

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most state-of-the-art wide-coverage parsers are trained on newspaper text and suffer a loss of accuracy in other domains, making parser adaptation a pressing issue. In this paper we demonstrate that a CCG parser can be adapted to two new domains, biomedical text and questions for a QA system, by using manually-annotated training data at the pos and lexical category levels only. This approach achieves parser accuracy comparable to that on newspaper data without the need for annotated parse trees in the new domain. We find that retraining at the lexical category level yields a larger performance increase for questions than for biomedical text and analyze the two datasets to investigate why different domains might behave differently for parser adaptation.