Sentence-level ranking with quality estimation

Authors:
Eleftherios Avramidis
Affiliations:
Language Technology Lab, German Research Center for Artificial Intelligence (DFKI GmbH), Berlin, Germany
Venue:
Machine Translation
Year:
2013

Citing 26
Cited 0

Orange: from experimental machine learning to interactive data mining

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Word-level confidence estimation for machine translation using phrase-based translation models

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Label ranking by learning pairwise preferences

Artificial Intelligence
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Ranking vs. regression in machine translation evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
The effect of correcting grammatical errors on parse probabilities

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Machine translation evaluation versus quality estimation

Machine Translation
Bridging SMT and TM with translation recommendation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Ruffus

Bioinformatics
Findings of the 2011 Workshop on Statistical Machine Translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Evaluate with confidence estimation: machine ranking of translation outputs using grammatical features

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
e-rating machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Tuning as ranking

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Putting human assessments of machine translation systems in order

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Linguistic features for quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
The SDL language weaver systems in the WMT12 quality estimation shared task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Combining quality prediction and system selection for improved automatic translation output

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Starting from human annotations, we provide a strategy based on machine learning that performs preference ranking on alternative machine translations of the same source, at sentence level. Rankings are decomposed into pairwise comparisons so that they can be learned by binary classifiers, using black-box features derived from linguistic analysis. In order to recompose from the pairwise decisions of the classifier, they are weighed with their classification probabilities, increasing the correlation coefficient by 80 %. We also demonstrate several configurations of successful automatic ranking models. The best configurations achieve a correlation with human judgments measured by Kendall's tau at 0.27. Although the method does not use reference translations, this correlation is comparable to the one achieved by state-of-the-art reference-aware automatic evaluation metrics such as smoothed BLEU, METEOR and Levenshtein distance.