How to Choose the Best Pivot Language for Automatic Translation of Low-Resource Languages

  • Authors:
  • Michael Paul;Andrew Finch;Eiichrio Sumita

  • Affiliations:
  • National Institute of Information and Communications Technology;National Institute of Information and Communications Technology;National Institute of Information and Communications Technology

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research on multilingual statistical machine translation focuses on the usage of pivot languages in order to overcome language resource limitations for certain language pairs. Due to the richness of available language resources, English is, in general, the pivot language of choice. However, factors like language relatedness can also effect the choice of the pivot language for a given language pair, especially for Asian languages, where language resources are currently quite limited. In this article, we provide new insights into what factors make a pivot language effective and investigate the impact of these factors on the overall pivot translation performance for translation between 22 Indo-European and Asian languages. Experimental results using state-of-the-art statistical machine translation techniques revealed that the translation quality of 54.8% of the language pairs improved when a non-English pivot language was chosen. Moreover, 81.0% of system performance variations can be explained by a combination of factors such as language family, vocabulary, sentence length, language perplexity, translation model entropy, reordering, monotonicity, and engine performance.