Evaluating question answering validation as a classification problem

Authors:
Álvaro Rodrigo;Anselmo Peñas;Felisa Verdejo
Affiliations:
NLP & IR Group at UNED, Madrid, Spain;NLP & IR Group at UNED, Madrid, Spain;NLP & IR Group at UNED, Madrid, Spain
Venue:
Language Resources and Evaluation
Year:
2012

Citing 11
Cited 1

Presenting results of experimental retrieval comparisons

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Robust Classification for Imprecise Environments

Machine Learning
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The structure and performance of an open-domain question answering system

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Evaluating evaluation metrics based on the bootstrap

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Methods for using textual entailment in open-domain question answering

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
On the reliability of information retrieval metrics based on graded relevance

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Testing the Reasoning for Question Answering Validation

Journal of Logic and Computation
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Overview of the answer validation exercise 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access

A simple measure to assess non-response

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Formulating Question Answering Validation as a classification problem facilitates the introduction of Machine Learning techniques to improve the overall performance of Question Answering systems. The different proportion of positive and negative examples in the evaluation collections has led to the use of measures based on precision and recall. However, an evaluation based on the analysis of Receiver Operating Characteristic (ROC) space is sometimes preferred in classification with unbalanced collections. In this article we compare both evaluation approaches according to their rationale, their stability, their discrimination power and their adequacy to the particularities of the Answer Validation task.