Combining EBMT, SMT, TM and IR technologies for quality and scale

  • Authors:
  • Sandipan Dandapat;Sara Morrissey;Andy Way;Joseph van Genabith

  • Affiliations:
  • CNGL, Dublin City University, Glasnevin, Dublin, Ireland;CNGL, Dublin City University, Glasnevin, Dublin, Ireland;Applied Language Solutions, Delph, UK;CNGL, Dublin City University, Glasnevin, Dublin, Ireland

  • Venue:
  • EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a hybrid statistical machine translation (SMT)-example-based MT (EBMT) system that shows significant improvement over both SMT and EBMT baseline systems. First we present a runtime EBMT system using a subsentential translation memory (TM). The EBMT system is further combined with an SMT system for effective hybridization of the pair of systems. The hybrid system shows significant improvement in translation quality (0.82 and 2.75 absolute BLEU points) for two different language pairs (English--Turkish (En--Tr) and English--French (En--Fr)) over the baseline SMT system. However, the EBMT approach suffers from significant time complexity issues for a runtime approach. We explore two methods to make the system scalable at runtime. First, we use an heuristic-based approach. Secondly, we use an IR-based indexing technique to speed up the time-consuming matching procedure of the EBMT system. The index-based matching procedure substantially improves run-time speed without affecting translation quality.