Char_align: a program for aligning parallel texts at the character level

  • Authors:
  • Kenneth Ward Church

  • Affiliations:
  • AT&T Bell Laboratories, Murray Hill, NJ

  • Venue:
  • ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and Rösenschein (to appear), Simard et al (1992), Warwick-Armstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, char_align, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al.