Machine translation of Arabic dialects

Authors:
Rabih Zbib;Erika Malchiodi;Jacob Devlin;David Stallard;Spyros Matsoukas;Richard Schwartz;John Makhoul;Omar F. Zaidan;Chris Callison-Burch
Affiliations:
Raytheon BBN Technologies, Cambridge MA;Raytheon BBN Technologies, Cambridge MA;Raytheon BBN Technologies, Cambridge MA;Raytheon BBN Technologies, Cambridge MA;Raytheon BBN Technologies, Cambridge MA;Raytheon BBN Technologies, Cambridge MA;Raytheon BBN Technologies, Cambridge MA;Microsoft Research, Redmond, WA;Johns Hopkins University, Baltimore MD
Venue:
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Year:
2012

Citing 12
Cited 4

A systematic comparison of various statistical alignment models

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
MAGEAD: a morphological analyzer and generator for the Arabic dialects

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Arabic Natural Language Processing

Arabic Natural Language Processing
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Arabic preprocessing schemes for statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Crowdsourcing translation: professional quality from non-professionals

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
The Arabic online commentary dataset: an annotated dataset of informal Arabic with high dialectal content

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation

DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties

Unsupervised morphology rivals supervised morphology for Arabic MT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Constructing parallel corpora for six Indian languages via crowdsourcing

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Leveraging arabic-english bilingual corpora with crowd sourcing-based annotation for arabic-hebrew SMT

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Discriminative framework for spoken tunisian dialect understanding

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Arabic Dialects present many challenges for machine translation, not least of which is the lack of data resources. We use crowdsourcing to cheaply and quickly build Levantine-English and Egyptian-English parallel corpora, consisting of 1.1M words and 380k words, respectively. The dialectal sentences are selected from a large corpus of Arabic web text, and translated using Amazon's Mechanical Turk. We use this data to build Dialectal Arabic MT systems, and find that small amounts of dialectal data have a dramatic impact on translation quality. When translating Egyptian and Levantine test sets, our Dialectal Arabic MT system performs 6.3 and 7.0 BLEU points higher than a Modern Standard Arabic MT system trained on a 150M-word Arabic-English parallel corpus.