Automatically generated parallel treebanks and their exploitability in machine translation

Authors:
John Tinsley;Andy Way
Affiliations:
School of Computing, National Centre for Language Technology, Dublin City University, Dublin 9, Ireland;School of Computing, National Centre for Language Technology, Dublin City University, Dublin 9, Ireland
Venue:
Machine Translation
Year:
2009

Citing 17
Cited 2

A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Scalable inference and training of context-rich syntactic translation models

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Using machine-learning to assign function labels to parser output for Spanish

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Design of a multi-lingual, parallel-processing statistical parsing engine

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Automatic generation of parallel treebanks

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Multi-dimensional annotation and alignment in an English-German translation corpus

NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system

SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Stat-XFER: a general search-based syntax-driven framework for machine translation

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing

Using common sense to generate culturally contextualized machine translation

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Panning for EBMT gold, or "Remembering not to forget"

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for improvements to the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically-motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PB-SMT) system leads to significant improvements in translation quality. Following this, we describe experiments in which we exploit the information encoded in the parallel treebank in other areas of the PB-SMT framework, while investigating the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the possibility of exploiting automatically-generated parallel treebanks further in syntax-aware paradigms of MT.