Statistical decision-tree models for parsing
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Evaluating machine translation with LFG dependencies
Machine Translation
Labelled dependencies in machine translation evaluation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Fluency, adequacy, or HTER?: exploring different human judgments with a tunable MT metric
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
The DCU dependency-based metric in WMT-MetricsMATR 2010
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Exploiting syntactic relationships in a phrase-based decoder: an exploration
Machine Translation
Linguistic measures for automatic machine translation evaluation
Machine Translation
Joint reranking of parsing and word recognition with automatic segmentation
Computer Speech and Language
Corroborating text evaluation results with heterogeneous measures
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fusion of word and letter based metrics for automatic MT evaluation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Recent efforts to develop new machine translation evaluation methods have tried to account for allowable wording differences either in terms of syntactic structure or synonyms/paraphrases. This paper primarily considers syntactic structure, combining scores from partial syntactic dependency matches with standard local n-gram matches using a statistical parser, and taking advantage of N-best parse probabilities. The new scoring metric, expected dependency pair match (EDPM), is shown to outperform BLEU and TER in terms of correlation to human judgments and as a predictor of HTER. Further, we combine the syntactic features of EDPM with the alternative wording features of TERp, showing a benefit to accounting for syntactic structure on top of semantic equivalency features.