Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment

Authors:
Marine Carpuat;Yuval Marton;Nizar Habash
Affiliations:
National Research Council, Gatineau, Canada J8X 3X7;IBM T. J. Watson Research Center, Yorktown Heights, USA 10598;Columbia University Center for Computational Learning Systems, New York, USA 10115
Venue:
Machine Translation
Year:
2012

Citing 27
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Intricacies of Collins' Parsing Model

Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Maximum entropy based restoration of Arabic diacritics

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving a statistical MT system with automatically learned rewrite patterns

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Algorithms for deterministic incremental dependency parsing

Computational Linguistics
Arabic Natural Language Processing

Arabic Natural Language Processing
Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Arabic preprocessing schemes for statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Cohesive constraints in a beam search phrase-based decoder

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Using syntax to improve word alignment precision for syntax-based machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Using shallow syntax information to improve word alignment and reordering for SMT

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
CATiB: the Columbia Arabic Treebank

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Improved word alignment with statistics and linguistic heuristics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Improving Arabic dependency parsing with lexical and inflectional morphological features

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Better Arabic parsing: baselines, evaluations, and analysis

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Clause restructuring for SMT not absolutely helpful

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Fuzzy syntactic reordering for phrase-based statistical machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation

Encouraging consistent translation choices

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study challenges raised by the order of Arabic verbs and their subjects in statistical machine translation (SMT). We show that the boundaries of post-verbal subjects (VS) are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. In addition, VS constructions have highly ambiguous reordering patterns when translated to English, and these patterns are very different for matrix (main clause) VS and non-matrix (subordinate clause) VS. Based on this analysis, we propose a novel method for leveraging VS information in SMT: we reorder VS constructions into pre-verbal (SV) order for word alignment. Unlike previous approaches to source-side reordering, phrase extraction and decoding are performed using the original Arabic word order. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline. Limiting reordering to matrix VS yields further improvements.