TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
(Meta-) evaluation of machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Manual and automatic evaluation of machine translation between European languages
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
TESLA: translation evaluation of sentences with linear-programming-based analysis
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
All in strings: a powerful string-based automatic MT evaluation metric with multiple granularities
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Morphemes and POS tags for n-gram based evaluation metrics
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Journal of the American Society for Information Science and Technology
Statistical machine translation enhancements through linguistic levels: A survey
ACM Computing Surveys (CSUR)
Fusion of word and letter based metrics for automatic MT evaluation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
We explored novel automatic evaluation measures for machine translation output oriented to the syntactic structure of the sentence: the Bleu score on the detailed Part-of-Speech (pos) tags as well as the precision, recall and F-measure obtained on pos n-grams. We also introduced F-measure based on both word and pos n-grams. Correlations between the new metrics and human judgments were calculated on the data of the first, second and third shared task of the Statistical Machine Translation Workshop. Machine translation outputs in four different European languages were taken into account: English, Spanish, French and German. The results show that the new measures correlate very well with the human judgements and that they are competitive with the widely used BLEU, METEOR and TER metrics.