SIAM Journal on Discrete Mathematics
The Minimum Feedback Arc Set Problem is NP-Hard for Tournaments
Combinatorics, Probability and Computing
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A grain of salt for the WMT manual evaluation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Findings of the 2011 Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Findings of the 2012 workshop on statistical machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
The efficacy of human post-editing for language translation
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Sentence-level ranking with quality estimation
Machine Translation
Hi-index | 0.00 |
Human assessment is often considered the gold standard in evaluation of translation systems. But in order for the evaluation to be meaningful, the rankings obtained from human assessment must be consistent and repeatable. Recent analysis by Bojar et al. (2011) raised several concerns about the rankings derived from human assessments of English-Czech translation systems in the 2010 Workshop on Machine Translation. We extend their analysis to all of the ranking tasks from 2010 and 2011, and show through an extension of their reasoning that the ranking is naturally cast as an instance of finding the minimum feedback arc set in a tournament, a well-known NP-complete problem. All instances of this problem in the workshop data are efficiently solvable, but in some cases the rankings it produces are surprisingly different from the ones previously published. This leads to strong caveats and recommendations for both producers and consumers of these rankings.