Automatic category label coarsening for syntax-based machine translation

Authors:
Greg Hanneman;Alon Lavie
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Year:
2011

Citing 9
Cited 0

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Tree-to-string alignment template for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic generation of parallel treebanks

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Preference grammars: softening syntactic constraints to improve statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Joshua: an open source toolkit for parsing-based machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Syntax augmented machine translation via chart parsing

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
The Meteor metric for automatic evaluation of machine translation

Machine Translation
Learning to translate with source and target syntax

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider SCFG-based MT systems that get syntactic category labels from parsing both the source and target sides of parallel training data. The resulting joint nonterminals often lead to needlessly large label sets that are not optimized for an MT scenario. This paper presents a method of iteratively coarsening a label set for a particular language pair and training corpus. We apply this label collapsing on Chinese--English and French--English grammars, obtaining test-set improvements of up to 2.8 BLEU, 5.2 TER, and 0.9 METEOR on Chinese--English translation. An analysis of label collapsing's effect on the grammar and the decoding process is also given.