Why generative phrase models underperform surface heuristics

Authors:
John DeNero;Dan Gillick;James Zhang;Dan Klein
Affiliations:
University of California, Berkeley, CA;University of California, Berkeley, CA;University of California, Berkeley, CA;University of California, Berkeley, CA
Venue:
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Year:
2006

Citing 5
Cited 27

Phrase-Based Statistical Machine Translation

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A phrase-based, joint probability model for statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Statistical machine translation

ACM Computing Surveys (CSUR)
Sampling alignment structure under a Bayesian translation model

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Phrase translation probabilities with ITG priors and smoothing as learning objective

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A phrase-based alignment model for natural language inference

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Selective phrase pair extraction for improved statistical machine translation

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
An iteratively-trained segmentation-free phrase translation model for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
A Gibbs sampler for phrasal synchronous grammar induction

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
A Bayesian model of syntax-directed tree to string grammar induction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Sinuhe: statistical machine translation using a globally trained conditional exponential family translation model

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Unsupervised syntactic alignment with inversion transduction grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Training phrase translation models with leaving-one-out

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bayesian synchronous tree-substitution grammar induction and its application to sentence compression

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Discriminative modeling of extraction sets for machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
The RWTH Aachen machine translation system for WMT 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Computing optimal alignments for the IBM-3 translation model

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Learning probabilistic synchronous CFGs for phrase-based translation

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
An unsupervised model for joint phrase alignment and extraction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning hierarchical translation structure with linguistic annotations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Gappy phrasal alignment by agreement

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
The RWTH Aachen machine translation system for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
From n-gram-based to CRF-based translation models

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Fast generation of translation forest for large-scale SMT discriminative training

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast inference in phrase extraction models with belief propagation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Phrase model training for statistical machine translation with word lattices of preprocessing alternatives

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Leave-one-out phrase model training for large-scale deployment

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Unsupervised sub-tree alignment for tree-to-tree translation

Journal of Artificial Intelligence Research
Maximum-entropy word alignment and posterior-based phrase extraction for machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate why weights from generative models underperform heuristic estimates in phrase-based machine translation. We first propose a simple generative, phrase-based model and verify that its estimates are inferior to those given by surface statistics. The performance gap stems primarily from the addition of a hidden segmentation variable, which increases the capacity for overfitting during maximum likelihood training with EM. In particular, while word level models benefit greatly from re-estimation, phrase-level models do not: the crucial difference is that distinct word alignments cannot all be correct, while distinct segmentations can. Alternate segmentations rather than alternate alignments compete, resulting in increased deter-minization of the phrase table, decreased generalization, and decreased final BLEU score. We also show that interpolation of the two methods can result in a modest increase in BLEU score.