Fully automatic semantic MT evaluation

Authors:
Chi-kiu Lo;Anand Karthik Tumuluru;Dekai Wu
Affiliations:
Hong Kong University of Science and Technology;Hong Kong University of Science and Technology;Hong Kong University of Science and Technology
Venue:
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Year:
2012

Citing 11
Cited 0

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Linguistic features for automatic evaluation of heterogenous MT systems

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A smorgasbord of features for automatic MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Manual and automatic evaluation of machine translation between European languages

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Structured vs. flat semantic role representations for machine translation evaluation

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
SMT versus AI redux: how semantic fames evaluate MT more accurately

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce the first fully automatic, fully semantic frame based MT evaluation metric, MEANT, that outperforms all other commonly used automatic metrics in correlating with human judgment on translation adequacy. Recent work on HMEANT, which is a human metric, indicates that machine translation can be better evaluated via semantic frames than other evaluation paradigms, requiring only minimal effort from monolingual humans to annotate and align semantic frames in the reference and machine translations. We propose a surprisingly effective Occam's razor automation of HMEANT that combines standard shallow semantic parsing with a simple maximum weighted bipartite matching algorithm for aligning semantic frames. The matching criterion is based on lexical similarity scoring of the semantic role fillers through a simple context vector model which can readily be trained using any publicly available large monolingual corpus. Sentence level correlation analysis, following standard NIST MetricsMATR protocol, shows that this fully automated version of HMEANT achieves significantly higher Kendall correlation with human adequacy judgments than BLEU, NIST, METEOR, PER, CDER, WER, or TER. Furthermore, we demonstrate that performing the semantic frame alignment automatically actually tends to be just as good as performing it manually. Despite its high performance, fully automated MEANT is still able to preserve HMEANT's virtues of simplicity, representational transparency, and inexpensiveness.