BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Measuring user acceptability of machine translations to diagnose system errors: an experience report
COLING-MTIA '02 Proceedings of the 2002 COLING workshop on Machine translation in Asia - Volume 16
Effects of machine translation on collaborative work
CSCW '06 Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
The affect of machine translation on the performance of Arabic-English QA system
MLQA '06 Proceedings of the Workshop on Multilingual Question Answering
Machine translation evaluation versus quality estimation
Machine Translation
MT error detection for cross-lingual question answering
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Towards automatic error analysis of machine translation output
Computational Linguistics
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
Despite much research on machine translation (MT) evaluation, there is surprisingly little work that directly measures users' intuitive or emotional preferences regarding different types of MT errors. However, the elicitation and modeling of user preferences is an important prerequisite for research on user adaptation and customization of MT engines. In this paper we explore the use of conjoint analysis as a formal quantitative framework to assess users' relative preferences for different types of translation errors. We apply our approach to the analysis of MT output from translating public health documents from English into Spanish. Our results indicate that word order errors are clearly the most dispreferred error type, followed by word sense, morphological, and function word errors. The conjoint analysis-based model is able to predict user preferences more accurately than a baseline model that chooses the translation with the fewest errors overall. Additionally we analyze the effect of using a crowd-sourced respondent population versus a sample of domain experts and observe that main preference effects are remarkably stable across the two samples.