The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Semi-supervised training for the averaged perceptron POS tagger
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Further meta-evaluation of machine translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Findings of the 2011 Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Morpheme- and POS-based IBM1 scores and language model scores for translation quality estimation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DFKI's SMT system for WMT 2012
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Quality estimation for machine translation: some lessons learned
Machine Translation
Hi-index | 0.00 |
Current metrics for evaluating machine translation quality have the huge drawback that they require human-quality reference translations. We propose a truly automatic evaluation metric based on ibm1 lexicon probabilities which does not need any reference translations. Several variants of ibm1 scores are systematically explored in order to find the most promising directions. Correlations between the new metrics and human judgments are calculated on the data of the third, fourth and fifth shared tasks of the Statistical Machine Translation Workshop. Five different European languages are taken into account: English, Spanish, French, German and Czech. The results show that the ibm1 scores are competitive with the classic evaluation metrics, the most promising being ibm1 scores calculated on morphemes and pos-4grams.