Bootstrapping bilingual data using consensus translation for a multilingual instant messaging system

  • Authors:
  • Srinivas Bangalore;Vanessa Murdock;Giuseppe Riccardi

  • Affiliations:
  • AT&T Labs-Research, Florham Park, NJ;University of Massachusetts, Amherst, MA;AT&T Labs-Research, Florham Park, NJ

  • Venue:
  • COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the primary issues in training statistical translation models is the paucity of bilingual data. In this paper, we propose techniques to alleviate the bilingual data bottleneck by creating a consensus from translations of monolingual data provided by several off-the-shelf translation engines. We compute the consensus alignment using a multi-sequence alignment algorithm used for DNA sequence alignment. We present an application of this technique to bootstrap bilingual data for the general domain of instant messaging. We train hierarchical statistical translation models on the bootstrapped bilingual data and show that the resulting statistical translation model outperforms each individual off-the-shelf translation system.