Decision trees for lexical smoothing in statistical machine translation

Authors:
Rabih Zbib;Spyros Matsoukas;Richard Schwartz;John Makhoul
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA
Venue:
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Year:
2010

Citing 14
Cited 1

A systematic comparison of various statistical alignment models

Computational Linguistics
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
Maximum entropy based restoration of Arabic diacritics

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Word-sense disambiguation for machine translation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Arabic diacritization through full morphological tagging

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Context-dependent alignment models for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Arabic diacritization using weighted finite-state transducers

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Automatic diacritization of Arabic for acoustic modeling in speech recognition

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Rich source-side context for statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation

Methods for integrating rule-based and statistical systems for Arabic to English machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for incorporating arbitrary context-informed word attributes into statistical machine translation by clustering attribute-qualified source words, and smoothing their word translation probabilities using binary decision trees. We describe two ways in which the decision trees are used in machine translation: by using the attribute-qualified source word clusters directly, or by using attribute-dependent lexical translation probabilities that are obtained from the trees, as a lexical smoothing feature in the decoder model. We present experiments using Arabic-to-English newswire data, and using Arabic diacritics and part-of-speech as source word attributes, and show that the proposed method improves on a state-of-the-art translation system.