Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario

  • Authors:
  • Alon Lavie;Stephan Vogel;Lori Levin;Erik Peterson;Katharina Probst;Ariadna Font Llitjós;Rachel Reynolds;Jaime Carbonell;Richard Cohen

  • Affiliations:
  • Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA;University Center for International Studies, University of Pittsburgh, Pittsburgh, PA

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2003

Quantified Score

Hi-index 0.02

Visualization

Abstract

We describe an experiment designed to evaluate the capabilities of our trainable transfer-based (Xfer) machine translation approach, as applied to the task of Hindi-to-English translation, and trained under an extremely limited data scenario. We compare the performance of the Xfer approach with two corpus-based approaches---Statistical MT (SMT) and Example-based MT (EBMT)---under the limited data scenario. The results indicate that the Xfer system significantly outperforms both EBMT and SMT in this scenario. Results also indicate that automatically learned transfer rules are effective in improving translation performance, compared with a baseline word-to-word translation version of the system. Xfer system performance with a limited number of manually written transfer rules is, however, still better than the current automatically inferred rules. Furthermore, a "multiengine" version of our system that combined the output of the Xfer and SMT systems and optimizes translation selection outperformed both individual systems.