Integration of an Arabic transliteration module into a statistical machine translation system

Authors:
Mehdi M. Kashani;Eric Joanis;Roland Kuhn;George Foster;Fred Popowich
Affiliations:
Simon Fraser University, Burnaby, BC, Canada;NRC Institute for Information Technology, Gatineau, QC, Canada;NRC Institute for Information Technology, Gatineau, QC, Canada;NRC Institute for Information Technology, Gatineau, QC, Canada;Simon Fraser University, Burnaby, BC, Canada
Venue:
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Year:
2007

Citing 9
Cited 1

Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Machine transliteration of names in Arabic text

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
An integrated approach for Arabic-English named entity translation

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
PORTAGE: a phrase-based machine translation system

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Hindi-to-Urdu machine translation through transliteration

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We provide an in-depth analysis of the integration of an Arabic-to-English transliteration system into a general-purpose phrase-based statistical machine translation system. We study the integration from different aspects and evaluate the improvement that can be attributed to the integration using the BLEU metric. Our experiments show that a transliteration module can help significantly in the situation where the test data is rich with previously unseen named entities. We obtain 70% and 53% of the theoretical maximum improvement we could achieve, as measured by an oracle on development and test sets respectively for OOV words (out of vocabulary source words not appearing in the phrase table).