The Meteor metric for automatic evaluation of machine translation

Authors:
Alon Lavie;Michael J. Denkowski
Affiliations:
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA 15213;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA 15213
Venue:
Machine Translation
Year:
2009

Citing 11
Cited 27

Information Retrieval

Information Retrieval
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Machine translation system combination with flexible word ordering

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation

Phrasal: a toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features

HLT-DEMO '10 Proceedings of the NAACL HLT 2010 Demonstration Session
The best lexical metric for phrase-based statistical MT system optimization

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved features and grammar selection for syntax-based MT

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
The DCU dependency-based metric in WMT-MetricsMATR 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Towards a programmable instrumented generator

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
I've said it before, and I'll say it again: an empirical investigation of the upper bound of the selection approach to dialogue

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Unsupervised word alignment with arbitrary features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Translating from morphologically complex languages: a paraphrase-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic category label coarsening for syntax-based machine translation

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
A general-purpose rule extractor for SCFG-based machine translation

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
iSpy: automatic reconstruction of typed input from compromising reflections

Proceedings of the 18th ACM conference on Computer and communications security
A lightweight evaluation framework for machine translation reordering

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
AMBER: a modified BLEU, enhanced ranking metric

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
e-rating machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
CMU syntax-based machine translation at WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Training a parser for machine translation reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Tuning as ranking

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation

Machine Translation
ATT-0: submission to generation challenges 2011 surface realization: shared task

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
HyTER: meaning-equivalent semantics for translation evaluation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
TakeLab: systems for measuring semantic text similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
PORT: a precision-order-recall MT evaluation metric for tuning

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Combining word-level and character-level models for machine translation between closely-related languages

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Semantic interpretation of noun compounds using verbal and other paraphrases

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Seeing double: reconstructing obscured typed input from repeated compromising reflections

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
The Method of Generalized Grammar Structures for Text to Gestures Computer-Aided Translation

Cybernetics and Systems Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Meteor Automatic Metric for Machine Translation evaluation, originally developed and released in 2004, was designed with the explicit goal of producing sentence-level scores which correlate well with human judgments of translation quality. Several key design decisions were incorporated into Meteor in support of this goal. In contrast with IBM's Bleu, which uses only precision-based features, Meteor uses and emphasizes recall in addition to precision, a property that has been confirmed by several metrics as being critical for high correlation with human judgments. Meteor also addresses the problem of reference translation variability by utilizing flexible word matching, allowing for morphological variants and synonyms to be taken into account as legitimate correspondences. Furthermore, the feature ingredients within Meteor are parameterized, allowing for the tuning of the metric's free parameters in search of values that result in optimal correlation with human judgments. Optimal parameters can be separately tuned for different types of human judgments and for different languages. We discuss the initial design of the Meteor metric, subsequent improvements, and performance in several independent evaluations in recent years.