Alignment and Matching of Bilingual English–Chinese News Texts

  • Authors:
  • Donghua Xu;Chew Lim Tan

  • Affiliations:
  • Department of Computer Science, National University of Singapore, Singapore 119260 E-mail: xu@cc.gatech.edu;Department of Computer Science, National University of Singapore, Singapore 119260 tancl@comp.nus.edu.sg

  • Venue:
  • Machine Translation
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a project to align and match bilingual English–Chinesenews files downloaded from the China News Service's website.The work involves the alignment of bilingual texts at the sentence andclause levels. It addition, the work also requires matching of filesas the English and Chinese news files downloaded from the web do notcome in the same sequential order. These news files have their owncharacteristics and, furthermore, the issue of file-matching has itsunique difficulties apart from the known problems of alignment workpreviously reported in the literature. To align the news files wecombine the criteria of ``anchors'' (i.e. unambiguous correspondingtext elements) and sentence length. We employ Dynamic Programming first toalign at the paragraph level, then to align at the sentence-clauselevel. The precision and recall of the alignment are satisfactory forfree translation texts. To match English and Chinese files, we make useof the anchor alone. In file matching we encounter a``collision'' problem due to contending matching candidates, andpropose a recursive splitting algorithm to resolve the problem. Weallow human intervention to improve the precision of matching, andsucceeded in achieving 100% precision with a fairly small amount ofmanual effort. Finally, to determine the various parameters used inaligning and matching, we utilize a Genetic Algorithm software packageto obtain their optimized values.