Aligning more words with high precision for small bilingual corpora

  • Authors:
  • Sur-Jin Ker;Jason J. S. Chang

  • Affiliations:
  • National Tsing Hua University, Hsinchu, Taiwan, ROC;National Tsing Hua University, Hsinchu, Taiwan, ROC

  • Venue:
  • COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an algorithm for aligning words with their translation in a bilingual corpus. Conventional algorithms are based on word-by-word models which require bilingual data with hundreds of thousand sentences for training. By using a word-based approach, less frequent words or words with diverse translations generally do not have statistically significant evidence for confident alignment. Consequently, incomplete or incorrect alignments occur. Our algorithm attempts to handle the problem using class-based rules which are automatic acquired from bilingual materials such as a bilingual corpus or machine readable dictionary. The procedures for acquiring these rules is also described. We found that the algorithm can align over 80% of word pairs while maintaining a comparably high precision rate, even when a small corpus was used in training. The algorithm also poses the advantage of producing a tagged corpus for word sense disambiguation.