Character-level machine translation evaluation for languages with ambiguous word boundaries

  • Authors:
  • Chang Liu;Hwee Tou Ng

  • Affiliations:
  • National University of Singapore, Singapore;National University of Singapore, Singapore

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, we introduce the TESLA-CELAB metric (Translation Evaluation of Sentences with Linear-programming-based Analysis -- Character-level Evaluation for Languages with Ambiguous word Boundaries) for automatic machine translation evaluation. For languages such as Chinese where words usually have meaningful internal structure and word boundaries are often fuzzy, TESLA-CELAB acknowledges the advantage of character-level evaluation over word-level evaluation. By reformulating the problem in the linear programming framework, TESLA-CELAB addresses several drawbacks of the character-level metrics, in particular the modeling of synonyms spanning multiple characters. We show empirically that TESLA-CELAB significantly outperforms character-level BLEU in the English-Chinese translation evaluation tasks.