Automatic evaluation of translation quality for distant language pairs

Authors:
Hideki Isozaki;Tsutomu Hirao;Kevin Duh;Katsuhito Sudoh;Hajime Tsukada
Affiliations:
NTT Corporation, Sorakugun, Kyoto, Japan;NTT Corporation, Sorakugun, Kyoto, Japan;NTT Corporation, Sorakugun, Kyoto, Japan;NTT Corporation, Sorakugun, Kyoto, Japan;NTT Corporation, Sorakugun, Kyoto, Japan
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 8
Cited 13

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results

HLT '02 Proceedings of the second international conference on Human Language Technology Research
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Metrics for MT evaluation: evaluating reordering

Machine Translation
Head finalization: a simple reordering rule for SOV languages

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
LRscore for evaluating lexical and reordering quality in MT

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

A lightweight evaluation framework for machine translation reordering

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
AMBER: a modified BLEU, enhanced ranking metric

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Training a parser for machine translation reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Akamon: an open source toolkit for tree/forest-based statistical machine translation

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Learning to translate with multiple objectives

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
PORT: a precision-order-recall MT evaluation metric for tuning

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Post-ordering by parsing for Japanese-English statistical machine translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Inducing a discriminative parser to optimize machine translation reordering

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Head finalization reordering for Chinese-to-Japanese machine translation

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Improving AMBER, an MT evaluation metric

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation

ACM Transactions on Asian Language Information Processing (TALIP)
Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation

ACM Transactions on Asian Language Information Processing (TALIP)
Distortion Model Based on Word Sequence Labeling for Statistical Machine Translation

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic evaluation of Machine Translation (MT) quality is essential to developing high-quality MT systems. Various evaluation metrics have been proposed, and BLEU is now used as the de facto standard metric. However, when we consider translation between distant language pairs such as Japanese and English, most popular metrics (e.g., BLEU, NIST, PER, and TER) do not work well. It is well known that Japanese and English have completely different word orders, and special care must be paid to word order in translation. Otherwise, translations with wrong word order often lead to misunderstanding and incomprehensibility. For instance, SMT-based Japanese-to-English translators tend to translate 'A because B' as 'B because A.' Thus, word order is the most important problem for distant language translation. However, conventional evaluation metrics do not significantly penalize such word order mistakes. Therefore, locally optimizing these metrics leads to inadequate translations. In this paper, we propose an automatic evaluation metric based on rank correlation coefficients modified with precision. Our meta-evaluation of the NTCIR-7 PATMT JE task data shows that this metric outperforms conventional metrics.