High-performance bilingual text alignment using statistical and dictionary information

  • Authors:
  • Masahiko Haruno;Takefumi Yamazaki

  • Affiliations:
  • NTT Communication Science Labs., Take Yokosuka-Shi, Kanagawa, Japan;NTT Communication Science Labs., Take Yokosuka-Shi, Kanagawa, Japan

  • Venue:
  • ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English, there is a limitation on the amount of word correspondences that can be statistically acquired. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. The other is the word correspondences that are statistically acquired in the alignment process. Our method gradually determines sentence pairs (anchors) that correspond to each other by relaxing parameters. The method, by combining two kinds of word correspondences, achieves adequate word correspondences for complete alignment. As a result, texts of various length and of various genres in structurally different languages can be aligned with high precision. Experimental results show our system outperforms conventional methods for various kinds of Japanese-English texts.