Data-Oriented Translation

  • Authors:
  • Arjen Poutsma

  • Affiliations:
  • University of Amsterdam, the Netherlands

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, we present a statistical approach to machine translation that is based on Data-Oriented Parsing: Data-Oriented Translation (DOT). In DOT, we use linked subtree pairs for creating a derivation of a source sentence. Each linked subtree pair has a certain probability, and consists of two trees: one in the source language and one in the target language. When a derivation has been formed with these subtree pairs, we can create a translation from this derivation. Since there are typically many different derivations of the same sentence in the source language, there can be as many different translations for it. The probability of a translation can be calculated as the total probability of all the derivations that form this translation. We give the computational aspects for this model, show that we can convert each subtree pair into a productive rewrite rule, and that the most probable translation can be computed by means of Monte Carlo disambiguation. Finally, we discuss some pilot experiments with the Verbmobil corpus.