Word alignment for languages with scarce resources using bilingual corpora of other language pairs

  • Authors:
  • Haifeng Wang;Hua Wu;Zhanyi Liu

  • Affiliations:
  • Toshiba (China) Research and Development Center, Dong Cheng District, Beijing, China;Toshiba (China) Research and Development Center, Dong Cheng District, Beijing, China;Toshiba (China) Research and Development Center, Dong Cheng District, Beijing, China

  • Venue:
  • COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an approach to improve word alignment for languages with scarce resources using bilingual corpora of other language pairs. To perform word alignment between languages L1 and L2, we introduce a third language L3. Although only small amounts of bilingual data are available for the desired language pair L1-L2, large-scale bilingual corpora in L1-L3 and L2-L3 are available. Based on these two additional corpora and with L3 as the pivot language, we build a word alignment model for L1 and L2. This approach can build a word alignment model for two languages even if no bilingual corpus is available in this language pair. In addition, we build another word alignment model for L1 and L2 using the small L1-L2 bilingual corpus. Then we interpolate the above two models to further improve word alignment between L1 and L2. Experimental results indicate a relative error rate reduction of 21.30% as compared with the method only using the small bilingual corpus in L1 and L2.