Statistical translation after source reordering: Oracles, context-aware models, and empirical analysis

Authors:
Maxim Khalilov;Khalil Sima'an
Affiliations:
Institute for logic, language and computation, university of amsterdamp.o. box 94242, 1090 ge amsterdam, the netherlands e-mails: maxim@tauslabs.com, k.simaan@uva.nl;Institute for logic, language and computation, university of amsterdamp.o. box 94242, 1090 ge amsterdam, the netherlands e-mails: maxim@tauslabs.com, k.simaan@uva.nl
Venue:
Natural Language Engineering
Year:
2011

Citing 37
Cited 0

Phrase-Based Statistical Machine Translation

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
An efficient method for determining bilingual word classes

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Machine translation with a stochastic grammatical channel

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A comparative study on reordering constraints in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving a statistical MT system with automatically learned rewrite patterns

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Statistical machine reordering

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A unigram orientation model for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Syntax augmented machine translation via chart parsing

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Binarization of synchronous context-free grammars

Computational Linguistics
Learning linear ordering problems for better translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Context-free reordering, finite-state translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A data mining approach to learn reorder rules for SMT

HLT-SRWS '10 Proceedings of the NAACL HLT 2010 Student Research Workshop
Head finalization: a simple reordering rule for SOV languages

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
LRscore for evaluating lexical and reordering quality in MT

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Automatically learning source-side reordering rules for large scale machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Syntax based reordering with automatically derived rules for improved statistical machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Re-structuring, re-labeling, and re-aligning for syntax-based machine translation

Computational Linguistics
Unsupervised word alignment with arbitrary features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning hierarchical translation structure with linguistic annotations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Training a parser for machine translation reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Inducing sentence structure from parallel corpora for reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A word reordering model for improved machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In source reordering the order of the source words is permuted to minimize word order differences with the target sentence and then fed to a translation model. Earlier work highlights the benefits of resolving long-distance reorderings as a pre-processing step to standard phrase-based models. However, the potential performance improvement of source reordering and its impact on the components of the subsequent translation model remain unexplored. In this paper we study both aspects of source reordering. We set up idealized source reordering (oracle) models with/without syntax and present our own syntax-driven model of source reordering. The latter is a statistical model of inversion transduction grammar (ITG)-like tree transductions manipulating a syntactic parse and working with novel conditional reordering parameters. Having set up the models, we report translation experiments showing significant improvement on three language pairs, and contribute an extensive analysis of the impact of source reordering (both oracle and model) on the translation model regarding the quality of its input, phrase-table, and output. Our experiments show that oracle source reordering has untapped potential in improving translation system output. Besides solving difficult reorderings, we find that source reordering creates more monotone parallel training data at the back-end, leading to significantly larger phrase tables with higher coverage of phrase types in unseen data. Unfortunately, this nice property does not carry over to tree-constrained source reordering. Our analysis shows that, from the string-level perspective, tree-constrained reordering might selectively permute word order, leading to larger phrase tables but without increase in phrase coverage in unseen data.