Packing it all up in search for a language independent MT quality measure tool - part two

  • Authors:
  • Kimmo Kettunen

  • Affiliations:
  • Kymenlaakso University of Applied Sciences, Kouvola, Finland

  • Venue:
  • LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This study describes first usage of a particular implementation of Normalized Compression Distance (NCD) as a machine translation quality evaluation tool. NCD has been introduced and tested for clustering and classification of different types of data and found a reliable and general tool. As far as we know NCD in its Complearn implementation has not been evaluated as a MT quality tool yet, and we wish to show that it can also be used for this purpose. We show that NCD scores given for MT outputs in different languages correlate highly with scores of a state-of-the-art MT evaluation metrics, METEOR 0.6. Our experiments are based on translations between one source and three target languages with a smallish sample that has available reference translations, UN's Universal Declaration of Human Rights. Secondly we shall also briefly describe and discuss results of a larger scale evaluation of NCD as an MT metric with WMT08 Shared Task Evaluation Data. These evaluations confirm further that NCD is a noteworthy MT metric both in itself and also enriched with basic language tools, stemming and Wordnet.