Two ways to use a noisy parallel news corpus for improving statistical machine translation

  • Authors:
  • Souhir Gahbiche-Braham;Hélène Bonneau-Maynard;François Yvon

  • Affiliations:
  • Université Paris-Sud, LIMSI-CNRS, Orsay, France;Université Paris-Sud, LIMSI-CNRS, Orsay, France;Université Paris-Sud, LIMSI-CNRS, Orsay, France

  • Venue:
  • BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present two methods to use a noisy parallel news corpus to improve statistical machine translation (SMT) systems. Taking full advantage of the characteristics of our corpus and of existing resources, we use a bootstrapping strategy, whereby an existing SMT engine is used both to detect parallel sentences in comparable data and to provide an adaptation corpus for translation models. MT experiments demonstrate the benefits of various combinations of these strategies.