Information Retrieval
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Precision and recall of machine translation
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
ORANGE: a method for evaluating automatic evaluation metrics for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Stochastic iterative alignment for machine translation evaluation
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
A simple automatic MT evaluation metric
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Improving alignment for SMT by reordering and augmenting the training corpus
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
The Meteor metric for automatic evaluation of machine translation
Machine Translation
Extending the meteor machine translation evaluation metric to the phrase level
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The best lexical metric for phrase-based statistical MT system optimization
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Evaluating machine translations using mNCD
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Vs and OOVs: two problems for translation between German and English
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Normalized compression distance based measures for MetricsMATR 2010
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Evaluating N-gram based evaluation metrics for automatic keyphrase extraction
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
The importance of visual context clues in multimedia translation
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Automatic categorization for improving Spanish into Spanish Sign Language machine translation
Computer Speech and Language
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Evaluation of arabic machine translation system based on the universal networking language
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Textual evidence gathering and analysis
IBM Journal of Research and Development
Hi-index | 0.00 |
This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08. Our primary submission is the Meteor metric tuned for optimizing correlation with human rankings of translation hypotheses. We show significant improvement in correlation as compared to the earlier version of metric which was tuned to optimized correlation with traditional adequacy and fluency judgments. We also describe m-bleu and m-ter, enhanced versions of two other widely used metrics bleu and ter respectively, which extend the exact word matching used in these metrics with the flexible matching based on stemming and Wordnet in Meteor.