A syntax-based statistical translation model
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A hierarchical phrase-based model for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Scalable inference and training of context-rich syntactic translation models
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hierarchical Phrase-Based Translation
Computational Linguistics
Using machine-learning to assign function labels to parser output for Spanish
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Design of a multi-lingual, parallel-processing statistical parsing engine
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Automatic generation of parallel treebanks
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Multi-dimensional annotation and alignment in an English-German translation corpus
NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system
SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Stat-XFER: a general search-based syntax-driven framework for machine translation
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Using common sense to generate culturally contextualized machine translation
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Panning for EBMT gold, or "Remembering not to forget"
Machine Translation
Hi-index | 0.00 |
Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for improvements to the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically-motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PB-SMT) system leads to significant improvements in translation quality. Following this, we describe experiments in which we exploit the information encoded in the parallel treebank in other areas of the PB-SMT framework, while investigating the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the possibility of exploiting automatically-generated parallel treebanks further in syntax-aware paradigms of MT.