Learning transfer rules for machine translation with limited data

  • Authors:
  • Katharina Probst;Alon Lavie

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • Learning transfer rules for machine translation with limited data
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The transfer-based approach to machine translation (MT) captures structural transfers between the source language and the target language, with the goal of producing grammatical translations. The major drawback of the approach is the development bottleneck, requiring many human-years of rule development. On the other hand, data-driven approaches such as example-based and statistical MT achieve fast system development by deriving mostly non-structural translation information from bilingual corpora. This thesis aims at striking a balance between both approaches by inferring transfer rules automatically from bilingual text, aiming specifically at scenarios where bilingual data is in sparse supply. The rules are learned using a variety of information, such as parses that are available for one of the languages, and morphological information that is available for both languages. They are learned in three stages, first producing an initial hypothesis, then capturing the syntactic structure, and finally adding appropriate unification constraints. The learned rules are used in a run-time translation system, a statistical transfer system which is a combination of a transfer engine and a statistical decoder. We demonstrate the effectiveness of the learned rules on Hebrew→English and a Hindi→English translation tasks. The main contribution of this thesis is a new framework for inferring structural information with feature constraints from bilingual text, as well as an investigation of the taxonomy of learnable rules and their effectiveness. The framework is designed to be applicable for any language pair, and the inferred rules can be used in conjunction with a statistical decoder. In addition to presenting methods to integrate syntactic and statistical information, the thesis makes a case for inferring information from very small training corpora, and provides methods to do so.