Normalized compression distance based measures for MetricsMATR 2010

Authors:
Marcus Dobrinkat;Jaakko Väyrynen;Tero Tapiovaara;Kimmo Kettunen
Affiliations:
Aalto University School of Science and Technology, Aalto, Finland;Aalto University School of Science and Technology, Aalto, Finland;Aalto University School of Science and Technology, Aalto, Finland;Kymenlaakso University of Applied Sciences, Kotka, Finland
Venue:
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Year:
2010

Citing 5
Cited 2

Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Evaluating machine translations using mNCD

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Packing it all up in search for a language independent MT quality measure tool - part two

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Clustering by compression

IEEE Transactions on Information Theory

Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Packing it all up in search for a language independent MT quality measure tool - part two

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the MT-NCD and MT-mNCD machine translation evaluation metrics as submission to the machine translation evaluation shared task (MetricsMATR 2010). The metrics are based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and evaluated against human judgments from the WMT08 shared task. The experiments show that 1) our metric improves correlation to human judgments by using flexible matching, 2) segment replication is effective, and 3) our NCD-inspired method for multiple references indicates improved results. Generally, the proposed MT-NCD and MT-mNCD methods correlate competitively with human judgments compared to commonly used machine translations evaluation metrics, for instance, BLEU.