Phrase-Based Statistical Machine Translation Using Approximate Matching

  • Authors:
  • Jesús Tomás;Jaime Lloret;Francisco Casacuberta

  • Affiliations:
  • Instituto Tecnolgico de Informtica,;Departamento de Comunicaciones, Universidad Politcnica de Valencia, 46071 Valencia, Spain;Departamento de Comunicaciones, Universidad Politcnica de Valencia, 46071 Valencia, Spain

  • Venue:
  • IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phrase-based statistical models constitute one of the most competitive pattern-recognition approaches to machine translation. In this case, the source sentence is fragmented into phrases, then, each phrase is translated by using a stochastic dictionary. One shortcoming of this phrase-based model is that it does not have an adequate generalization capability. If a sequence of words has not been seen in training, it cannot be translated as a whole phrase. In this paper we try to overcome this drawback. The basic idea is that if a source phrase is not in our dictionary (has not been seen in training), we look for the most similar in our dictionary and try to adapt its translation to the source phrase. We are using the well known edit distance as a measure of similarity. We present results from an English-Spanish task (XRCE).