A linear space algorithm for computing maximal common subsequences
Communications of the ACM
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic Evaluation of Information Ordering: Kendall's Tau
Computational Linguistics
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Predicting success in machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Linguistic features for automatic evaluation of heterogenous MT systems
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Textual entailment features for machine translation evaluation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
LRscore for evaluating lexical and reordering quality in MT
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
The parameter-optimized ATEC metric for MT evaluation
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Automatic evaluation of translation quality for distant language pairs
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic translation error analysis
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
A lightweight evaluation framework for machine translation reordering
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Inducing sentence structure from parallel corpora for reordering
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning to translate with multiple objectives
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Modified distortion matrices for phrase-based statistical machine translation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Forced derivation tree based model training to statistical machine translation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Inducing a discriminative parser to optimize machine translation reordering
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Translating between dissimilar languages requires an account of the use of divergent word orders when expressing the same semantic content. Reordering poses a serious problem for statistical machine translation systems and has generated a considerable body of research aimed at meeting its challenges. Direct evaluation of reordering requires automatic metrics that explicitly measure the quality of word order choices in translations. Current metrics, such as BLEU, only evaluate reordering indirectly. We analyse the ability of current metrics to capture reordering performance. We then introduce permutation distance metrics as a direct method for measuring word order similarity between translations and reference sentences. By correlating all metrics with a novel method for eliciting human judgements of reordering quality, we show that current metrics are largely influenced by lexical choice, and that they are not able to distinguish between different reordering scenarios. Also, we show that permutation distance metrics correlate very well with human judgements, and are impervious to lexical differences.