ILR-based MT comprehension test with multi-level questions

Authors:
Douglas Jones;Martha Herzog;Hussny Ibrahim;Arvind Jairam;Wade Shen;Edward Gibson;Michael Emonts
Affiliations:
MIT Lincoln Laboratory, Lexington, MA;MIT Lincoln Laboratory, Lexington, MA;DLI Foreign Language Center, Monterey, CA;MIT Lincoln Laboratory, Lexington, MA;MIT Lincoln Laboratory, Lexington, MA;MIT Brain and Cognitive, Cambridge, MA;DLI Foreign Language Center, Monterey, CA
Venue:
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Year:
2007

Citing 2
Cited 2

Evaluation in the ARPA machine translation program: 1993 methodology

HLT '94 Proceedings of the workshop on Human Language Technology
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research

The NIST 2008 Metrics for machine translation challenge--overview, methodology, metrics, and results

Machine Translation
e-rating machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the MT condition, for both text and audio data types, with high quality human reference Gold Standard (GS) translations. Overall, subjects achieved 95% comprehension for GS and 74% for MT, across 4 genres and 3 difficulty levels. Surprisingly, comprehension rates do not correlate highly with translation error rates, suggesting that we are measuring an additional dimension of MT quality. We observed that it takes 15% more time overall to read MT than GS.