Bilingual sentence alignment based on punctuation statistics and lexicon

  • Authors:
  • Thomas C. Chuang;Jian-Cheng Wu;Tracy Lin;Wen-Chie Shei;Jason S. Chang

  • Affiliations:
  • Department of Computer Science, Vanung University, Chung-Li, Tao-Yuan;Department of Computer Science, National Tsing Hua University, Hsinchu;Department of Telecommunication, National Chiao Tung University, Hsinchu;Department of Computer Science, National Tsing Hua University, Hsinchu;Department of Computer Science, National Tsing Hua University, Hsinchu

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new method of aligning bilingual parallel texts based on punctuation statistics and lexical information. It is demonstrated that the punctuation statistics prove to be effective means to achieve good results. The task of sentence alignment of bilingual texts written in disparate language pairs like English and Chinese is reportedly more difficult. We examine the feasibility of using punctuations for high accuracy sentence alignment. Encouraging precision rate is demonstrated in aligning sentences in bilingual parallel corpora based solely on punctuation statistics. Improved results were obtained when both punctuation statistics and lexical information were employed. We have experimented with an implementation of the proposed method on the parallel corpora of Sinorama Magazine and Records of the Hong Kong Legislative Council with satisfactory results.