A semantic feature for statistical machine translation

  • Authors:
  • Rafael E. Banchs;Marta R. Costa-jussà

  • Affiliations:
  • Institute for Infocomm Research, Singapore;Barcelona Media Innovation Centre, planta, Barcelona

  • Venue:
  • SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A semantic feature for statistical machine translation, based on Latent Semantic Indexing, is proposed and evaluated. The objective of the proposed feature is to account for the degree of similarity between a given input sentence and each individual sentence in the training dataset. This similarity is computed in a reduced vector-space constructed by means of the Latent Semantic Indexing decomposition. The computed similarity values are used as an additional feature in the log-linear model combination approach to statistical machine translation. In our implementation, the proposed feature is dynamically adjusted for each translation unit in the translation table according to the current input sentence to be translated. This model aims at favoring those translation units that were extracted from training sentences that are semantically related to the current input sentence being translated. Experimental results on a Spanish-to-English translation task on the Bible corpus demonstrate a significant improvement on translation quality with respect to a baseline system.