Machine translation evaluation versus quality estimation

Authors:
Lucia Specia;Dhwaj Raj;Marco Turchi
Affiliations:
Research Group in Computational Linguistics, University of Wolverhampton, Wolverhampton, UK;Indian Institute of Information Technology, Allahabad, India;European Commission --- JRC (IPSC), Ispra, Italy 21020
Venue:
Machine Translation
Year:
2010

Citing 15
Cited 12

Making large-scale support vector machine learning practical

Advances in kernel methods
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Confidence estimation for translation prediction

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improving query translation with confidence estimation for cross language information retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Translating with non-contiguous phrases

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A smorgasbord of features for automatic MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Textual entailment features for machine translation evaluation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
PORTAGE: with smoothed phrase tables and segment choice models

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Sinuhe: statistical machine translation using a globally trained conditional exponential family translation model

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Towards predicting post-editing productivity

Machine Translation
Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Evaluate with confidence estimation: machine ranking of translation outputs using grammatical features

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Non-linear models for confidence estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Match without a referee: evaluating MT adequacy without reference translations

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Agent metaphor for machine translation mediated communication

Proceedings of the 2013 international conference on Intelligent user interfaces
Oracle decoding as a new way to analyze phrase-based machine translation

Machine Translation
Identifying useful human correction feedback from an on-line machine translation service

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Quality estimation for machine translation: some lessons learned

Machine Translation
Sentence-level ranking with quality estimation

Machine Translation
Predicting sentence translation quality using extrinsic and language independent features

Machine Translation
A conjoint analysis framework for evaluating user preferences in machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most evaluation metrics for machine translation (MT) require reference translations for each sentence in order to produce a score reflecting certain aspects of its quality. The de facto metrics, BLEU and NIST, are known to have good correlation with human evaluation at the corpus level, but this is not the case at the segment level. As an attempt to overcome these two limitations, we address the problem of evaluating the quality of MT as a prediction task, where reference-independent features are extracted from the input sentences and their translation, and a quality score is obtained based on models produced from training data. We show that this approach yields better correlation with human evaluation as compared to commonly used metrics, even with models trained on different MT systems, language-pairs and text domains.