Comparing example-based and statistical machine translation

Authors:
Andy Way;Nano Gough
Affiliations:
School of Computing, Dublin City University, Dublin 9, Ireland e-mail: away@computing.dcu.ie, ngough@computing.dcu.ie;School of Computing, Dublin City University, Dublin 9, Ireland e-mail: away@computing.dcu.ie, ngough@computing.dcu.ie
Venue:
Natural Language Engineering
Year:
2005

Citing 8
Cited 3

A framework of a mechanical translation between Japanese and English by analogy principle

Proc. of the international NATO symposium on Artificial and human intelligence
A statistical approach to machine translation

Computational Linguistics
A systematic comparison of various statistical alignment models

Computational Linguistics
wEBMT: developing and validating an example-based machine translation system using the world wide web

Computational Linguistics - Special issue on web as corpus
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Generalized multitext grammars

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Hybrid example-based SMT: the best of both worlds?

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Panning for EBMT gold, or "Remembering not to forget"

Machine Translation
Arabic text to arabic sign language translation system for the deaf and hearing-impaired community

SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In previous work (Gough and Way 2004), we showed that our Example-Based Machine Translation (EBMT) system improved with respect to both coverage and quality when seeded with increasing amounts of training data, so that it significantly outperformed the on-line MT system Logomedia according to a wide variety of automatic evaluation metrics. While it is perhaps unsurprising that system performance is correlated with the amount of training data, we address in this paper the question of whether a large-scale, robust EBMT system such as ours can outperform a Statistical Machine Translation (SMT) system. We obtained a large English-French translation memory from Sun Microsystems from which we randomly extracted a near 4K test set. The remaining data was split into three training sets, of roughly 50K, 100K and 200K sentence-pairs in order to measure the effect of increasing the size of the training data on the performance of the two systems. Our main observation is that contrary to perceived wisdom in the field, there appears to be little substance to the claim that SMT systems are guaranteed to outperform EBMT systems when confronted with ‘enough’ training data. Our tests on a 4.8 million word bitext indicate that while SMT appears to outperform our system for French-English on a number of metrics, for English-French, on all but one automatic evaluation metric, the performance of our EBMT system is superior to the baseline SMT model.