QuestionBank: creating a corpus of parse-annotated questions

Authors:
John Judge;Aoife Cahill;Josef van Genabith
Affiliations:
Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland and IBM Dublin, Ireland
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 6
Cited 19

The ATIS spoken language systems pilot corpus

HLT '90 Proceedings of the workshop on Speech and Natural Language
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Learning question classifiers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A simple pattern-matching algorithm for recovering empty nodes and their antecedents

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Fostering Multi-Modal Summarization for Trend Information

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
Adapting a lexicalized-grammar parser to contrasting domains

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exact phrases in information retrieval for question answering

IRQA '08 Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering
Efficient graph-based semi-supervised learning of structured tagging models

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Evaluation of dependency parsers on unbounded dependencies

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
From symbolic to sub-symbolic information in question classification

Artificial Intelligence Review
Bootstrapping multiple-choice tests with THE-MENTOR

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Parsing natural language queries for life science knowledge

BioNLP '11 Proceedings of BioNLP 2011 Workshop
The Uppsala-FBK systems at WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Training a parser for machine translation reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Training dependency parsers by jointly optimizing multiple objectives

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploring linguistically-rich patterns for question generation

UCNLG+EVAL '11 Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
Syntactic annotations for the Google Books Ngram Corpus

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Using search-logs to improve query tagging

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Improved parsing and POS tagging using inter-sentence consistency constraints

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Automatic keyword extraction from single-sentence natural language queries

PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Learning dependency-based compositional semantics

Computational Linguistics
Learning domain differences automatically for dependency parsing adaptation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents (capturing long distance dependencies) from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are (i) using QuestionBank training data improves parser performance to 89.75% labelled bracketing f-score, an increase of almost 11% over the baseline; (ii) back-testing experiments on non-question data (Penn-II WSJ Section 23) shows that the retrained parser does not suffer a performance drop on non-question material; (iii) ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results; (iv) our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision (96.82%) and low recall (39.38%). In summary, QuestionBank provides a useful new resource in parser-based QA research.