Disambiguating "DE" for Chinese-English machine translation

Authors:
Pi-Chuan Chang;Dan Jurafsky;Christopher D. Manning
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Year:
2009

Citing 8
Cited 4

Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Is it harder to parse Chinese, or the Chinese Treebank?

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving a statistical MT system with automatically learned rewrite patterns

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A simple and effective hierarchical phrase reordering model

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Optimizing Chinese word segmentation for machine translation performance

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation

Context-free reordering, finite-state translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A discriminative latent variable-based "DE" classifier for Chinese--English SMT

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improved Chinese--English SMT with Chinese “DE” Construction Classification and Reordering

ACM Transactions on Asian Language Information Processing (TALIP)
Using sense-labeled discourse connectives for statistical machine translation

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linking constructions involving [Abstract contained text which could not be captured.] (DE) are ubiquitous in Chinese, and can be translated into English in many different ways. This is a major source of machine translation error, even when syntax-sensitive translation models are used. This paper explores how getting more information about the syntactic, semantic, and discourse context of uses of [Abstract contained text which could not be captured.] (DE) can facilitate producing an appropriate English translation strategy. We describe a finer-grained classification of [Abstract contained text which could not be captured.] (DE) constructions in Chinese NPs, construct a corpus of annotated examples, and then train a log-linear classifier, which contains linguistically inspired features. We use the DE classifier to preprocess MT data by explicitly labeling [Abstract contained text which could not be captured.] (DE) constructions, as well as reordering phrases, and show that our approach provides significant BLEU point gains on MT02 (+1.24), MT03 (+0.88) and MT05 (+1.49) on a phrased-based system. The improvement persists when a hierarchical reordering model is applied.