Two-Stage Hypotheses Generation for Spoken Language Translation

  • Authors:
  • Boxing Chen;Min Zhang;Ai Ti Aw

  • Affiliations:
  • Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spoken Language Translation (SLT) is the research area that focuses on the translation of speech or text between two spoken languages. Phrase-based and syntax-based methods represent the state-of-the-art for statistical machine translation (SMT). The phrase-based method specializes in modeling local reorderings and translations of multiword expressions. The syntax-based method is enhanced by using syntactic knowledge, which can better model long word reorderings, discontinuous phrases, and syntactic structure. In this article, we leverage on the strength of these two methods and propose a strategy based on multiple hypotheses generation in a two-stage framework for spoken language translation. The hypotheses are generated in two stages, namely, decoding and regeneration. In the decoding stage, we apply state-of-the-art, phrase-based, and syntax-based methods to generate basic translation hypotheses. Then in the regeneration stage, much more hypotheses that cannot be captured by the decoding algorithms are produced from the basic hypotheses. We study three regeneration methods: redecoding, n-gram expansion, and confusion network in the second stage. Finally, an additional reranking pass is introduced to select the translation outputs by a linear combination of rescoring models. Experimental results on the Chinese-to-English IWSLT-2006 challenge task of translating the transcription of spontaneous speech show that the proposed mechanism achieves significant improvements over the baseline of about 2.80 BLEU-score.