Morpho-syntactic Arabic preprocessing for Arabic-to-English statistical machine translation

Authors:
Anas El Isbihani;Shahram Khadivi;Oliver Bender;Hermann Ney
Affiliations:
RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany
Venue:
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Year:
2006

Citing 12
Cited 4

A statistical approach to machine translation

Computational Linguistics
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Empirical methods for compound splitting

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Language model based arabic word segmentation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
An unsupervised method for multilingual word sense tagging using parallel corpora: a preliminary investigation

WorkSense '00 Proceedings of the ACL-2000 Workshop on Word Senses and Multi-Linguality

Discriminative Phrase-Based Models for Arabic Machine Translation

ACM Transactions on Asian Language Information Processing (TALIP)
Symbolic-to-statistical hybridization: extending generation-heavy machine translation

Machine Translation
Using TectoMT as a preprocessing tool for phrase-based statistical machine translation

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Arabic language has far richer systems of inflection and derivation than English which has very little morphology. This morphology difference causes a large gap between the vocabulary sizes in any given parallel training corpus. Segmentation of inflected Arabic words is a way to smooth its highly morphological nature. In this paper, we describe some statistically and linguistically motivated methods for Arabic word segmentation. Then, we show the efficiency of proposed methods on the Arabic-English BTEC and NIST tasks.