Chunk-lattices for verb reordering in Arabic---English statistical machine translation

Authors:
Arianna Bisazza;Daniele Pighin;Marcello Federico
Affiliations:
Fondazione Bruno Kessler-IRST, Trento, Italy;Fondazione Bruno Kessler-IRST, Trento, Italy and TALP Research Center-Universitat Politécnica de Catalunya, Barcelona, Spain;Fondazione Bruno Kessler-IRST, Trento, Italy
Venue:
Machine Translation
Year:
2012

Citing 22
Cited 2

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Phrase-Based Statistical Machine Translation

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A systematic comparison of various statistical alignment models

Computational Linguistics
Fast decoding and optimal decoding for machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Distortion models for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A simple and effective hierarchical phrase reordering model

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A unigram orientation model for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Syntactic reordering for English-Arabic phrase-based machine translation

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Using shallow syntax information to improve word alignment and reordering for SMT

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
A quantitative analysis of reordering phenomena

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
A POS-based model for long-range reorderings in SMT

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Discriminative reordering models for statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Metrics for MT evaluation: evaluating reordering

Machine Translation
Improved models of distortion cost for statistical machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
FBK at WMT 2010: word lattices for morphological reduction and chunk-based reordering

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

Modified distortion matrices for phrase-based statistical machine translation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Automatic dominant character identification in fables based on verb analysis - Empirical study on the impact of anaphora resolution

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Syntactic disfluencies in Arabic-to-English phrase-based SMT output are often due to incorrect verb reordering in Verb---Subject---Object sentences. As a solution, we propose a chunk-based reordering technique to automatically displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is used to preprocess the training data, and to collect statistics about verb movements. From this analysis we build specific verb reordering lattices on the test sentences before decoding, and test different lattice-weighting schemes. Finally, we train a feature-rich discriminative model to predict likely verb reorderings for a given Arabic sentence. The model scores are used to prune the reordering lattice, leading to better word reordering at decoding time. The application of our reordering methods to the training and test data results in consistent improvements on the NIST-MT 2009 Arabic---English benchmark, both in terms of BLEU (+1.06%) and of reordering quality (+0.85%) measured with the Kendall Reordering Score.