Quality estimation for machine translation: some lessons learned

Authors:
Guillaume Wisniewski;Anil Kumar Singh;François Yvon
Affiliations:
LIMSI--Université Paris Sud, Orsay, France;LIMSI--CNRS, Orsay, France;LIMSI--Université Paris Sud, Orsay, France
Venue:
Machine Translation
Year:
2013

Citing 29
Cited 0

A statistical approach to machine translation

Computational Linguistics
Random Forests

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Automated scoring using a hybrid feature identification technique

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A Branch and Bound Algorithm for Feature Subset Selection

IEEE Transactions on Computers
Dataset Shift in Machine Learning

Dataset Shift in Machine Learning
Inter-coder agreement for computational linguistics

Computational Linguistics
Predicting the readability of short web summaries

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Manual and automatic evaluation of machine translation between European languages

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Statistical Machine Translation

Statistical Machine Translation
Machine translation evaluation versus quality estimation

Machine Translation
Error detection for statistical machine translation using linguistic features

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
TrustRank: inducing trust in automatic translations via ranking

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Goodness: a method for measuring machine translation confidence

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Scikit-learn: Machine Learning in Python

The Journal of Machine Learning Research
Evaluation without references: IBM1 scores as evaluation metrics

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Morphemes and POS tags for n-gram based evaluation metrics

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Linguistic features for quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Morpheme- and POS-based IBM1 scores and language model scores for translation quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DCU-symantec submission for the WMT 2012 quality estimation task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
The SDL language weaver systems in the WMT12 quality estimation shared task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Non-linear models for confidence estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Comparing human perceptions of post-editing effort with post-editing operations

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
LIMSI @ WMT'12

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
N-gram posterior probability confidence measures for statistical machine translation: an empirical study

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The dissemination of statistical machine translation (SMT) systems in the professional translation industry is still limited by the lack of reliability of SMT outputs, the quality of which varies to a great extent. A critical piece of information would be for MT systems to automatically assess their output translations with automatically derived quality measures. Predicting quality measures was indeed the goal of a shared task at the Workshop on SMT in 2012. In this contribution, we first report our results for this shared task, detailing the features that we found to be the most predictive of quality. In the latter part, we reexamine the shared task data and protocol and show that several factors actually contributed to the difficulty of the task, and discuss alternative evaluation designs.