Towards automatic error analysis of machine translation output

Authors:
Maja Popović;Hermann Ney
Affiliations:
RWTH Aachen University;RWTH Aachen University
Venue:
Computational Linguistics
Year:
2011

Citing 11
Cited 10

Phrase-Based Statistical Machine Translation

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Pattern visualization for machine translation output

HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
Hierarchical Phrase-Based Translation

Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Diagnostic evaluation of machine translation systems using automatically constructed linguistic check-points

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Deep linguistic multilingual translation and bilingual dictionaries

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Morpho-syntactic information for automatic error analysis of statistical machine translation output

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Linguistically annotated reordering: Evaluation and analysis

Computational Linguistics

A graphical interface for MT evaluation and error analysis

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Toward determining the comprehensibility of machine translations

PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
TerrorCat: a translation error categorization-based MT quality metric

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Class error rates for evaluation of machine translation output

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DFKI's SMT system for WMT 2012

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Assessing the accuracy of discourse connective translations: validation of an automatic metric

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
A diagnostic evaluation approach for english to hindi MT using linguistic checkpoints and error rates

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Oracle decoding as a new way to analyze phrase-based machine translation

Machine Translation
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)
A conjoint analysis framework for evaluating user preferences in machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards development of automatic evaluation measures that provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors in many different ways. This work focuses on one possible set-up, namely, on five error categories: inflectional errors, errors due to wrong word order, missing words, extra words, and incorrect lexical choices. For each of the categories, we analyze the contribution of various POS classes. We compared the results of automatic error analysis with the results of human error analysis in order to investigate two possible applications: estimating the contribution of each error type in a given translation output in order to identify the main sources of errors for a given translation system, and comparing different translation outputs using the introduced error categories in order to obtain more information about advantages and disadvantages of different systems and possibilites for improvements, as well as about advantages and disadvantages of applied methods for improvements. We used Arabic-English Newswire and Broadcast News and Chinese-English Newswire outputs created in the framework of the GALE project, several Spanish and English European Parliament outputs generated during the TC-Star project, and three German-English outputs generated in the framework of the fourth Machine Translation Workshop. We show that our results correlate very well with the results of a human error analysis, and that all our metrics except the extra words reflect well the differences between different versions of the same translation system as well as the differences between different translation systems.