Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations

Authors:
Mireia Farrús;Marta R. Costa-jussà;Maja Popović
Affiliations:
Universitat Oberta de Catalunya, Barcelona, Spain;Barcelona Media, Innovation Centre, Barcelona, Spain;Deutsches Forschungszentrum für Künstliche Intelligenz, Berlin, Germany
Venue:
Journal of the American Society for Information Science and Technology
Year:
2012

Citing 14
Cited 3

Machine translation: past, present, future

Machine translation: past, present, future
Phrase-Based Statistical Machine Translation

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Stochastic finite-state models for spoken language machine translation

NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5
N-gram-based Machine Translation

Computational Linguistics
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Linguistic features for automatic evaluation of heterogenous MT systems

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Syntax-oriented evaluation measures for machine translation output

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

Machine Translation
Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan---Spanish language pair

Language Resources and Evaluation

Linguistically-augmented Bulgarian-to-English statistical machine translation model

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)
A conjoint analysis framework for evaluating user preferences in machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluation of machine translation output is an important task. Various human evaluation techniques as well as automatic metrics have been proposed and investigated in the last decade. However, very few evaluation methods take the linguistic aspect into account. In this article, we use an objective evaluation method for machine translation output that classifies all translation errors into one of the five following linguistic levels: orthographic, morphological, lexical, semantic, and syntactic. Linguistic guidelines for the target language are required, and human evaluators use them in to classify the output errors. The experiments are performed on Englishto-Catalan and Spanish-to-Catalan translation outputs generated by four different systems: 2 rule-based and 2 statistical. All translations are evaluated using the 3 following methods: a standard human perceptual evaluation method, several widely used automatic metrics, and the human linguistic evaluation. Pearson and Spearman correlation coefficients between the linguistic, perceptual, and automatic results are then calculated, showing that the semantic level correlates significantly with both perceptual evaluation and automatic metrics.