"This sentence is wrong." Detecting errors in machine-translated sentences

Authors:
Sylvain Raybaud;David Langlois;Kamel Smaïli
Affiliations:
PAROLE, LORIA, Nancy Cedex, France 54506;PAROLE, LORIA, Nancy Cedex, France 54506;PAROLE, LORIA, Nancy Cedex, France 54506
Venue:
Machine Translation
Year:
2011

Citing 13
Cited 3

Information-Based Evaluation Criterion for Classifier's Performance

Machine Learning
Fundamentals of neural networks: architectures, algorithms, and applications

Fundamentals of neural networks: architectures, algorithms, and applications
Support-Vector Networks

Machine Learning
WordNet: a lexical database for English

Communications of the ACM
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A tutorial on support vector regression

Statistics and Computing
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Confidence estimation for translation prediction

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Word-level confidence estimation for machine translation using phrase-based translation models

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Rule-based translation with statistical phrase-based post-editing

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Findings of the 2012 workshop on statistical machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Black box features for the WMT 2012 quality estimation shared task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
LORIA system for the WMT12 quality estimation shared task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine translation systems are not reliable enough to be used "as is": except for the most simple tasks, they can only be used to grasp the general meaning of a text or assist human translators. The purpose of confidence measures is to detect erroneous words or sentences produced by a machine translation system. In this article, after reviewing the mathematical foundations of confidence estimation, we propose a comparison of several state-of-the-art confidence measures, predictive parameters and classifiers. We also propose two original confidence measures based on Mutual Information and a method for automatically generating data for training and testing classifiers. We applied these techniques to data from the WMT campaign 2008 and found that the best confidence measures yielded an Equal Error Rate of 36.3% at word level and 34.2% at sentence level, but combining different measures reduced these rates to 35.0% and 29.0%, respectively. We also present the results of an experiment aimed at determining how helpful confidence measures are in a post-editing task. Preliminary results suggest that our system is not yet ready to efficiently help post-editors, but we now have both software and a protocol that we can apply to further experiments, and user feedback has indicated aspects which must be improved in order to increase the level of helpfulness of confidence measures.