Applying machine translation evaluation techniques to textual CBR

Authors:
Ibrahim Adeyanju;Nirmalie Wiratunga;Robert Lothian;Susan Craw
Affiliations:
School of Computing, Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, Robert Gordon University, Aberdeen, Scotland, UK
Venue:
ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
Year:
2010

Citing 7
Cited 1

Modern Information Retrieval

Modern Information Retrieval
Case Retrieval Nets: Basic Ideas and Extensions

KI '96 Proceedings of the 20th Annual German Conference on Artificial Intelligence: Advances in Artificial Intelligence
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Textual case-based reasoning

The Knowledge Engineering Review
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Case Retrieval Reuse Net (CR2N): An Architecture for Reuse of Textual Solutions

ICCBR '09 Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Reasoning with textual cases

ICCBR'05 Proceedings of the 6th international conference on Case-Based Reasoning Research and Development

Two-part segmentation of text documents

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need for automated text evaluation is common to several AI disciplines. In this work, we explore the use of Machine Translation (MT) evaluation metrics for Textual Case Based Reasoning (TCBR). MT and TCBR typically propose textual solutions and both rely on human reference texts for evaluation purposes. Current TCBR evaluation metrics such as precision and recall employ a single human reference but these metrics are misleading when semantically similar texts are expressed with different sets of keywords. MT metrics overcome this challenge with the use of multiple human references. Here, we explore the use of multiple references as opposed to a single reference applied to incident reports from the medical domain. These references are created introspectively from the original dataset using the CBR similarity assumption. Results indicate that TCBR systems evaluated with these new metrics are closer to human judgements. The generated text in TCBR is typically similar in length to the reference since it is a revised form of an actual solution to a similar problem, unlike MT where generated texts can sometimes be significantly shorter. We therefore discovered that some parameters in the MT evaluation measures are not useful for TCBR due to the intrinsic difference in the text generation process.