Hybrid example-based SMT: the best of both worlds?

  • Authors:
  • Declan Groves;Andy Way

  • Affiliations:
  • Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland

  • Venue:
  • ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

(Way and Gough, 2005) provide an in-depth comparison of their Example-Based Machine Translation (EBMT) system with a Statistical Machine Translation (SMT) system constructed from freely available tools. According to a wide variety of automatic evaluation metrics, they demonstrated that their EBMT system outperformed the SMT system by a factor of two to one. Nevertheless, they did not test their EBMT system against a phrase-based SMT system. Obtaining their training and test data for English--French, we carry out a number of experiments using the Pharaoh SMT Decoder. While better results are seen when Pharaoh is seeded with Giza++ word- and phrase-based data compared to EBMT sub-sentential alignments, in general better results are obtained when combinations of this 'hybrid' data is used to construct the translation and probability models. While for the most part the EBMT system of (Gough & Way, 2004b) outperforms any flavour of the phrase-based SMT systems constructed in our experiments, combining the data sets automatically induced by both Giza++ and their EBMT system leads to a hybrid system which improves on the EBMT system per se for French--English.