A data driven approach to query expansion in question answering

Authors:
Leon Derczynski;Jun Wang;Robert Gaizauskas;Mark A. Greenwood
Affiliations:
University of Sheffield, Regent Court, Sheffield, UK;University of Sheffield, Regent Court, Sheffield, UK;University of Sheffield, Regent Court, Sheffield, UK;University of Sheffield, Regent Court, Sheffield, UK
Venue:
IRQA '08 Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering
Year:
2008

Citing 6
Cited 1

Incremental relevance feedback for information filtering

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Quantitative evaluation of passage retrieval algorithms for question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Building a reusable test collection for question answering

Journal of the American Society for Information Science and Technology - Research Articles
Discretization based learning approach to information retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Terrier information retrieval platform

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Evolutionary optimization for ranking how-to questions based on user-generated contents

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automated answering of natural language questions is an interesting and useful problem to solve. Question answering (QA) systems often perform information retrieval at an initial stage. Information retrieval (IR) performance, provided by engines such as Lucene, places a bound on overall system performance. For example, no answer bearing documents are retrieved at low ranks for almost 40% of questions. In this paper, answer texts from previous QA evaluations held as part of the Text REtrieval Conferences (TREC) are paired with queries and analysed in an attempt to identify performance-enhancing words. These words are then used to evaluate the performance of a query expansion method. Data driven extension words were found to help in over 70% of difficult questions. These words can be used to improve and evaluate query expansion methods. Simple blind relevance feedback (RF) was correctly predicted as unlikely to help overall performance, and an possible explanation is provided for its low value in IR for QA.