Methods for automatically evaluating answers to complex questions

Authors:
Jimmy Lin;Dina Demner-Fushman
Affiliations:
Aff1 Aff2 Aff3;Aff2 Aff3
Venue:
Information Retrieval
Year:
2006

Citing 0
Cited 12

Predicting information seeker satisfaction in community question answering

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Modeling information-seeker satisfaction in community question answering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Scientific paper summarization using citation summary networks

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
CoCQA: co-training over questions and answers with an application to predicting question subjectivity orientation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using citations to generate surveys of scientific paradigms

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The measurement of user satisfaction with question answering systems

Information and Management
Summarizing definition from Wikipedia

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Supporting synchronous social q&a throughout the question lifecycle

Proceedings of the 20th international conference on World wide web
Compositional question answering: A divide and conquer approach

Information Processing and Management: an International Journal
Click the search button and be happy: evaluating direct and immediate information access

Proceedings of the 20th ACM international conference on Information and knowledge management
Exploring semi-automatic nugget extraction for Japanese one click access evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Generating extractive summaries of scientific paradigms

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluation is a major driving force in advancing the state of the art in language technologies. In particular, methods for automatically assessing the quality of machine output is the preferred method for measuring progress, provided that these metrics have been validated against human judgments. Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, an automatic technique for evaluating answers to complex questions based on n-gram co-occurrences between machine output and a human-generated answer key. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information "nugget" appears in a system's response. The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003, TREC 2004, and TREC 2005 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.