The kappa statistic: a second look
Computational Linguistics
The third PASCAL recognizing textual entailment challenge
RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
A survey of paraphrasing and textual entailment methods
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We outline problems with the interpretation of accuracy in the presence of bias, arguing that the issue is a particularly pressing concern for RTE evaluation. Furthermore, we argue that average precision scores are unsuitable for RTE, and should not be reported. We advocate mutual information as a new evaluation measure that should be reported in addition to accuracy and confidence-weighted score.