Context-based Arabic morphological analysis for machine translation

Authors:
ThuyLinh Nguyen;Stephan Vogel
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Year:
2008

Citing 13
Cited 2

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Combination of Arabic preprocessing schemes for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Context-based morphological disambiguation with random fields

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Enhancing morphological alignment for translating highly inflected languages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a novel morphology preprocessing technique for Arabic-English translation. We exploit the Arabic morphology-English alignment to learn a model removing nonaligned Arabic morphemes. The model is an instance of the Conditional Random Field (Lafferty et al., 2001) model; it deletes a morpheme based on the morpheme's context. We achieved around two BLEU points improvement over the original Arabic translation for both a travel-domain system trained on 20K sentence pairs and a news domain system trained on 177K sentence pairs, and showed a potential improvement for a large-scale SMT system trained on 5 million sentence pairs.