Corpus expansion for statistical machine translation with semantic role label substitution rules

Authors:
Qin Gao;Stephan Vogel
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 6
Cited 5

The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Parallel implementations of word alignment tool

SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing

CMU Haitian Creole-English translation system for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
A Bayesian approach to unsupervised semantic role induction

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Crosslingual induction of semantic roles

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Enriching parallel corpora for statistical machine translation with semantic negation rephrasing

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Multilingual joint parsing of syntactic and semantic dependencies with a latent variable model

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with baseline models. Experimental results on Chinese-English machine translation tasks show an average improvement of 0.45 BLEU and 1.22 TER points across 5 different NIST test sets.