Phrase-Based Statistical Machine Translation
KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Pattern visualization for machine translation output
HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
Hierarchical Phrase-Based Translation
Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Deep linguistic multilingual translation and bilingual dictionaries
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Morpho-syntactic information for automatic error analysis of statistical machine translation output
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Linguistically annotated reordering: Evaluation and analysis
Computational Linguistics
A graphical interface for MT evaluation and error analysis
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Toward determining the comprehensibility of machine translations
PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
TerrorCat: a translation error categorization-based MT quality metric
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Class error rates for evaluation of machine translation output
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DFKI's SMT system for WMT 2012
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Assessing the accuracy of discourse connective translations: validation of an automatic metric
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Oracle decoding as a new way to analyze phrase-based machine translation
Machine Translation
Statistical machine translation enhancements through linguistic levels: A survey
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards development of automatic evaluation measures that provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors in many different ways. This work focuses on one possible set-up, namely, on five error categories: inflectional errors, errors due to wrong word order, missing words, extra words, and incorrect lexical choices. For each of the categories, we analyze the contribution of various POS classes. We compared the results of automatic error analysis with the results of human error analysis in order to investigate two possible applications: estimating the contribution of each error type in a given translation output in order to identify the main sources of errors for a given translation system, and comparing different translation outputs using the introduced error categories in order to obtain more information about advantages and disadvantages of different systems and possibilites for improvements, as well as about advantages and disadvantages of applied methods for improvements. We used Arabic-English Newswire and Broadcast News and Chinese-English Newswire outputs created in the framework of the GALE project, several Spanish and English European Parliament outputs generated during the TC-Star project, and three German-English outputs generated in the framework of the fourth Machine Translation Workshop. We show that our results correlate very well with the results of a human error analysis, and that all our metrics except the extra words reflect well the differences between different versions of the same translation system as well as the differences between different translation systems.