Efficient accurate syntactic direct translation models: one tree at a time

Authors:
Hany Hassan;Khalil Sima'An;Andy Way
Affiliations:
Microsoft Research, Redmond, USA;University of Amsterdam, Amsterdam, The Netherlands;Dublin City University, Dublin, Ireland
Venue:
Machine Translation
Year:
2012

Citing 13
Cited 0

A statistical approach to machine translation

Computational Linguistics
A maximum entropy approach to natural language processing

Computational Linguistics
The syntactic process

The syntactic process
Word reordering and a dynamic programming beam search algorithm for statistical machine translation

Computational Linguistics
Exploiting syntactic structure for natural language modeling

Exploiting syntactic structure for natural language modeling
Supertagging: an approach to almost parsing

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models

Computational Linguistics
SPMT: statistical machine translation with syntactified target language phrases

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Syntax augmented machine translation via chart parsing

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Syntactically Lexicalized Phrase-Based SMT

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A challenging aspect of Statistical Machine Translation from Arabic to English lies in bringing the Arabic source morpho-syntax to bear on the lexical as well as word-order choices of the English target string. In this article, we extend the feature-rich discriminative Direct Translation Model 2 (DTM2) with a novel linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar. This way we can reap the benefits of a target syntactic enhancement that leads to more grammatical output while also enabling dynamic decoding without the risk of blowing up decoding space and time requirements. Our model defines a mix of model parameters, some of which involve DTM2 source morpho-syntactic features, and others are novel target side syntactic features. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the-art DTM2 system.