Dialect translation: integrating Bayesian co-segmentation models with pivot-based SMT

  • Authors:
  • Michael Paul;Andrew Finch;Paul R. Dixon;Eiichiro Sumita

  • Affiliations:
  • National Institute of Information and Communications Technology, Kyoto, Japan;National Institute of Information and Communications Technology, Kyoto, Japan;National Institute of Information and Communications Technology, Kyoto, Japan;National Institute of Information and Communications Technology, Kyoto, Japan

  • Venue:
  • DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research on multilingual statistical machine translation (SMT) focuses on the usage of pivot languages in order to overcome resource limitations for certain language pairs. This paper proposes a new method to translate a dialect language into a foreign language by integrating transliteration approaches based on Bayesian co-segmentation (BCS) models with pivot-based SMT approaches. The advantages of the proposed method with respect to standard SMT approaches are three fold: (1) it uses a standard language as the pivot language and acquires knowledge about the relation between dialects and the standard language automatically, (2) it reduces the translation task complexity by using monotone decoding techniques, (3) it reduces the number of features in the log-linear model that have to be estimated from bilingual data. Experimental results translating four Japanese dialects (Kumamoto, Kyoto, Okinawa, Osaka) into four Indo-European languages (English, German, Russian, Hindi) and two Asian languages (Chinese, Korean) revealed that the proposed method improves the translation quality of dialect translation tasks and outperforms standard pivot translation approaches concatenating SMT engines for the majority of the investigated language pairs.