Combination of Arabic preprocessing schemes for statistical machine translation

Authors:
Fatiha Sadat;Nizar Habash
Affiliations:
National Research Council of Canada;Columbia University
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 11
Cited 19

Three heads are better than one

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Multi-engine machine translation with voted language model

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Combination of Arabic preprocessing schemes for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Multi-engine machine translation guided by explicit word matching

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
PORTAGE: a phrase-based machine translation system

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Combination of Arabic preprocessing schemes for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese word segmentation and statistical machine translation

ACM Transactions on Speech and Language Processing (TSLP)
Four techniques for online handling of out-of-vocabulary words in Arabic-English statistical machine translation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Context-based Arabic morphological analysis for machine translation

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Combination of statistical word alignments based on multiple preprocessing schemes

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Toward using morphology in French-English phrase-based SMT

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Improving Arabic-Chinese statistical machine translation using English as pivot language

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Morphology-Based Segmentation Combination for Arabic Mention Detection

ACM Transactions on Asian Language Information Processing (TALIP)
Symbolic-to-statistical hybridization: extending generation-heavy machine translation

Machine Translation
A comparison study of some Arabic root finding algorithms

Journal of the American Society for Information Science and Technology
Arabic Mention Detection: toward better unit of analysis

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Unsupervised search for the optimal segmentation for statistical machine translation

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
A hybrid morpheme-word representation for machine translation of morphologically rich languages

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Nonparametric word segmentation for machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Methods for integrating rule-based and statistical systems for Arabic to English machine translation

Machine Translation
A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation

Machine Translation
Improving machine translation of null subjects in Italian and Spanish

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised morphology rivals supervised morphology for Arabic MT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evaluate different methods for combining preprocessing schemes resulting in improved translation quality.