Fast and Accurate Sentence Alignment of Bilingual Corpora

  • Authors:
  • Robert C. Moore

  • Affiliations:
  • -

  • Venue:
  • AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new method for aligning sentences with their translations in a parallel bilingual corpus. Previous approaches have generally been based either on sentence length or word correspondences. Sentence-length-based methods are relatively fast and fairly accurate. Word-correspondence-based methods are generally more accurate but much slower, and usually depend on cognates or a bilingual lexicon. Our method adapts and combines these approaches, achieving high accuracy at a modest computational cost, and requiring no knowledge of the languages or the corpus beyond division into words and sentences.