A proposal on evaluation measures for RTE

Authors:
Richard Bergmair
Affiliations:
University of Cambridge Computer Laboratory, Cambridge, UK
Venue:
TextInfer '09 Proceedings of the 2009 Workshop on Applied Textual Inference
Year:
2009

Citing 2
Cited 1

The kappa statistic: a second look

Computational Linguistics
The third PASCAL recognizing textual entailment challenge

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

A survey of paraphrasing and textual entailment methods

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We outline problems with the interpretation of accuracy in the presence of bias, arguing that the issue is a particularly pressing concern for RTE evaluation. Furthermore, we argue that average precision scores are unsuitable for RTE, and should not be reported. We advocate mutual information as a new evaluation measure that should be reported in addition to accuracy and confidence-weighted score.