Precision and recall of machine translation
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
QARLA: a framework for the evaluation of text summarization systems
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
MT evaluation: human-like vs. human acceptable
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Word error rates: decomposition over Pos classes and applications for error analysis
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Labelled dependencies in machine translation evaluation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Linguistic features for automatic evaluation of heterogenous MT systems
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
On the robustness of syntactic and semantic features for automatic MT evaluation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
TrustRank: inducing trust in automatic translations via ranking
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
All in strings: a powerful string-based automatic MT evaluation metric with multiple granularities
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Corroborating text evaluation results with heterogeneous measures
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fusion of word and letter based metrics for automatic MT evaluation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
A number of approaches to Automatic MT Evaluation based on deep linguistic knowledge have been suggested. However, n-gram based metrics are still today the dominant approach. The main reason is that the advantages of employing deeper linguistic information have not been clarified yet. In this work, we propose a novel approach for meta-evaluation of MT evaluation metrics, since correlation cofficient against human judges do not reveal details about the advantages and disadvantages of particular metrics. We then use this approach to investigate the benefits of introducing linguistic features into evaluation metrics. Overall, our experiments show that (i) both lexical and linguistic metrics present complementary advantages and (ii) combining both kinds of metrics yields the most robust metaevaluation performance.