Using normalized alignment scores to detect incorrectly aligned segments

  • Authors:
  • Andreas Türk

  • Affiliations:
  • Matrixware Information Services, Vienna, Austria

  • Venue:
  • Proceedings of the 2nd international workshop on Patent information retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a number of quality metrics which can be used to automatically detect incorrectly aligned segment pairs. This is an important issue in commercial machine translation as segmentation and alignment of bilingual corpora is often performed by third parties whose quality assurances cannot always be relied upon. The metrics in this paper are based on the normalized logarithm of the alignment score of a segment pair, where the alignment score is calculated using an IBM translation model 4. The alignment quality metrics are evaluated in classification experiments on a Chinese-English patent translation task and are shown to yield satisfactory performance.