Efficient optimization for bilingual sentence alignment based on linear regression

  • Authors:
  • Bing Zhao;Klaus Zechner;Stephan Vogel;Alex Waibel

  • Affiliations:
  • Carnegie Mellon University;Educational Testing Service, Princeton, NJ;Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a study on optimizing sentence pair alignment scores of a bilingual sentence alignment module. Five candidate scores based on perplexity and sentence length are introduced and tested. Then a linear regression model based on those candidates is proposed and trained to predict sentence pairs' alignment quality scores solicited from human subjects. Experiments are carried out on data automatically collected from Internet. The correlation between the scores generated by the linear regression model and the scores from human subjects is in the range of the inter-subject agreement score correlations. Pearson's correlation ranges from 0.53 up to 0.72 in our experiments.