Fuzzy matching for N-gram-based MT evaluation

Authors:
Liangyou Li;Zhengxian Gong
Affiliations:
School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China
Venue:
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Year:
2012

Citing 6
Cited 0

An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Tree kernel-based semantic relation extraction with rich syntactic and semantic information

Information Sciences: an International Journal
Tree kernel-based semantic role labeling with enriched parse tree structure

Information Processing and Management: an International Journal
MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

N-gram-based metrics have been used widely in automatic evaluation of machine translation. However, most of them also lose merits due to the strict policy of matching of n-grams. Especially, the policy of exact matching leads to take synonyms as totally different words and thus give unreasonable estimation. This paper introduces fuzzy matching for n-grams, which refers to a semantic similarity function based on WordNet. And it is used to find a match with the highest similarity when incorporated into BLEU, the representative of n-gram-based evaluation metrics. Since WordNet can contribute more to high-order n-grams and fuzzy matching can perform well even with fewer references, experiments on MTC Part 2 (LDC2003T17) show our proposed method can greatly improve correlation between BLEU and human evaluation both at segment-level and document-level. Furthermore, BLEU incorporating fuzzy matching achieves more significant improvement at document-level evaluation.