An experiment in hybrid dictionary and statistical sentence alignment

  • Authors:
  • Nigel Collier;Kenji Ono;Hideki Hirakawa

  • Affiliations:
  • Toshiba Corporation, Kanagawa, Japan;Toshiba Corporation, Kanagawa, Japan;Toshiba Corporation, Kanagawa, Japan

  • Venue:
  • COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The task of aligning sentences in parallel corpora of two languages has been well studied using pure statistical or linguistic models. We developed a linguistic method based on lexical matching with a bilingual dictionary and two statistical methods based on sentence length ratios and sentence offset probabilities. This paper seeks to further our knowledge of the alignment task by comparing the performance of the alignment models when used separately and together, i.e. as a hybrid system. Our results show that for our English-Japanese corpus of newspaper articles, the hybrid system using lexical matching and sentence length ratios outperforms the pure methods.