BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Decomposability of translation metrics for improved evaluation and efficient algorithms
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Fluency, adequacy, or HTER?: exploring different human judgments with a tunable MT metric
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Robust machine translation evaluation with entailment features
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
The Meteor metric for automatic evaluation of machine translation
Machine Translation
The best lexical metric for phrase-based statistical MT system optimization
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
LRscore for evaluating lexical and reordering quality in MT
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
METEOR-NEXT and the METEOR paraphrase tables: improved evaluation support for five target languages
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
The DCU dependency-based metric in WMT-MetricsMATR 2010
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
TESLA: translation evaluation of sentences with linear-programming-based analysis
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Automatic evaluation of translation quality for distant language pairs
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Findings of the 2011 Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
PORT: a precision-order-recall MT evaluation metric for tuning
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Improving AMBER, an MT evaluation metric
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Hi-index | 0.00 |
This paper proposes a new automatic machine translation evaluation metric: AMBER, which is based on the metric BLEU but incorporates recall, extra penalties, and some text processing variants. There is very little linguistic information in AMBER. We evaluate its system-level correlation and sentence-level consistency scores with human rankings from the WMT shared evaluation task; AMBER achieves state-of-the-art performance.