BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Some Improvements over the BLEU Metric for Measuring Translation Quality for Hindi
ICCTA '07 Proceedings of the International Conference on Computing: Theory and Applications
Arabic preprocessing schemes for statistical machine translation
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Dependency-based automatic evaluation for machine translation
SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Computer Speech and Language
Hi-index | 0.00 |
The Defense Advanced Research Projects Agency (DARPA) Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program ( http://1.usa.gov/transtac ) faced many challenges in applying automated measures of translation quality to Iraqi Arabic---English speech translation dialogues. Features of speech data in general and of Iraqi Arabic data in particular undermine basic assumptions of automated measures that depend on matching system outputs to reference translations. These features are described along with the challenges they present for evaluating machine translation quality using automated metrics. We show that scores for translation into Iraqi Arabic exhibit higher correlations with human judgments when they are computed from normalized system outputs and reference translations. Orthographic normalization, lexical normalization, and operations involving light stemming resulted in higher correlations with human judgments.