Syntax-oriented evaluation measures for machine translation output

Authors:
Maja Popović;Hermann Ney
Affiliations:
RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany
Venue:
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Year:
2009

Citing 5
Cited 7

TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Manual and automatic evaluation of machine translation between European languages

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation

Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
TESLA: translation evaluation of sentences with linear-programming-based analysis

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
All in strings: a powerful string-based automatic MT evaluation metric with multiple granularities

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Morphemes and POS tags for n-gram based evaluation metrics

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations

Journal of the American Society for Information Science and Technology
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)
Fusion of word and letter based metrics for automatic MT evaluation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explored novel automatic evaluation measures for machine translation output oriented to the syntactic structure of the sentence: the Bleu score on the detailed Part-of-Speech (pos) tags as well as the precision, recall and F-measure obtained on pos n-grams. We also introduced F-measure based on both word and pos n-grams. Correlations between the new metrics and human judgments were calculated on the data of the first, second and third shared task of the Statistical Machine Translation Workshop. Machine translation outputs in four different European languages were taken into account: English, Spanish, French and German. The results show that the new measures correlate very well with the human judgements and that they are competitive with the widely used BLEU, METEOR and TER metrics.