Bilingual segmentation for alignment and translation

  • Authors:
  • Chung-Chi Huang;Wei-Teh Chen;Jason S. Chang

  • Affiliations:
  • Information Systems and Applications, NTHU, HsingChu, Taiwan, R.O.C.;Information Systems and Applications, NTHU, HsingChu, Taiwan, R.O.C.;Information Systems and Applications, NTHU, HsingChu, Taiwan, R.O.C.

  • Venue:
  • CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a method that bilingually segments sentences in languages with no clear delimiter for word boundaries. In our model, we first convert the search for the segmentation into a sequential tagging problem, allowing for a polynomial-time dynamic-programming solution, and incorporate a control to balance monolingual and bilingual information at hand. Our bilingual segmentation algorithm, the integration of a monolingual language model and a statistical translation model, is devised to tokenize sentences more suitably for bilingual applications such as word alignment and machine translation. Empirical results show that bilingually-motivated segmenters outperform pure monolingual one in both the word-aligning (12% reduction in error rate) and the translating (5% improvement in BLEU) tasks, suggesting monolingual segmentation is useful in some aspects but, in a sense, not built for bilingual researches.