ON THE STATISTICAL ESTIMATION OF STOCHASTIC FINITE-STATE TRANSDUCERS IN MACHINE TRANSLATION

  • Authors:
  • Jesús Andrés-Ferrer;Francisco Casacuberta-Nolla;Alfons Juan-Císcar

  • Affiliations:
  • Departamento de Sistemas Informáticos y Computación, Universidad Politécnica Valencia, Spain;Departamento de Sistemas Informáticos y Computación, Universidad Politécnica Valencia, Spain;Departamento de Sistemas Informáticos y Computación, Universidad Politécnica Valencia, Spain

  • Venue:
  • Applied Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The inference of finite-state transducers from bilingual training data plays an important role in many natural-language tasks and mainly in machine translation. However, there are only a few techniques to infer such models. One of these techniques is the grammatical inference and alignments for transducer inference (GIATI) technique that has proven to be very adequate for speech translation, text-input machine translation, or computer-assisted translation. GIATI is a heuristic technique that requires segmented training data (i.e., the input sentences and the output sentences must be segmented with the restriction that the input segments and the output segments must be monotone aligned). For the purpose of obtaining segmented training data, pure statistical word-alignment models are used. This technique is revisited in this article. The main goal is to formally derive the complete GIATI technique using classical expectation-maximization statistical estimation procedure. This new approach allows us to avoid a hard dependence on heuristic "external" statistical techniques (statistical alignments and n-grams). A first set of experimental results obtained in a machine-translation task are also reported to initially validate this new version of the inference technique of finite-state transducers.