Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

  • Authors:
  • Thi-Ngoc-Diep Do;Viet-Bac Le;Brigitte Bigi;Laurent Besacier;Eric Castelli

  • Affiliations:
  • LIG Laboratory, Grenoble, France and MICA Center, Hanoi, Vietnam;LIG Laboratory, Grenoble, France;LIG Laboratory, Grenoble, France;LIG Laboratory, Grenoble, France;MICA Center, Hanoi, Vietnam

  • Venue:
  • StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents our first attempt at constructing a Vietnamese-French statistical machine translation system. Since Vietnamese is an under-resourced language, we concentrate on building a large Vietnamese-French parallel corpus. A document alignment method based on publication date, special words and sentence alignment result is proposed. The paper also presents an application of the obtained parallel corpus to the construction of a Vietnamese-French statistical machine translation system, where the use of different units for Vietnamese (syllables, words, or their combinations) is discussed.