BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Precision and recall of machine translation
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
HLT '02 Proceedings of the second international conference on Human Language Technology Research
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Metrics for MT evaluation: evaluating reordering
Machine Translation
Head finalization: a simple reordering rule for SOV languages
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
LRscore for evaluating lexical and reordering quality in MT
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A lightweight evaluation framework for machine translation reordering
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
AMBER: a modified BLEU, enhanced ranking metric
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Training a parser for machine translation reordering
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Akamon: an open source toolkit for tree/forest-based statistical machine translation
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Learning to translate with multiple objectives
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
PORT: a precision-order-recall MT evaluation metric for tuning
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Post-ordering by parsing for Japanese-English statistical machine translation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Inducing a discriminative parser to optimize machine translation reordering
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Head finalization reordering for Chinese-to-Japanese machine translation
SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Improving AMBER, an MT evaluation metric
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation
ACM Transactions on Asian Language Information Processing (TALIP)
Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation
ACM Transactions on Asian Language Information Processing (TALIP)
Distortion Model Based on Word Sequence Labeling for Statistical Machine Translation
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
Automatic evaluation of Machine Translation (MT) quality is essential to developing high-quality MT systems. Various evaluation metrics have been proposed, and BLEU is now used as the de facto standard metric. However, when we consider translation between distant language pairs such as Japanese and English, most popular metrics (e.g., BLEU, NIST, PER, and TER) do not work well. It is well known that Japanese and English have completely different word orders, and special care must be paid to word order in translation. Otherwise, translations with wrong word order often lead to misunderstanding and incomprehensibility. For instance, SMT-based Japanese-to-English translators tend to translate 'A because B' as 'B because A.' Thus, word order is the most important problem for distant language translation. However, conventional evaluation metrics do not significantly penalize such word order mistakes. Therefore, locally optimizing these metrics leads to inadequate translations. In this paper, we propose an automatic evaluation metric based on rank correlation coefficients modified with precision. Our meta-evaluation of the NTCIR-7 PATMT JE task data shows that this metric outperforms conventional metrics.