Aligning sentences in bilingual corpora using lexical information

  • Authors:
  • Stanley F. Chen

  • Affiliations:
  • Harvard University, Cambridge, MA

  • Venue:
  • ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language independent.