Phrase-Based Statistical Machine Translation
KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A systematic comparison of various statistical alignment models
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A phrase-based, joint probability model for statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Statistical machine translation
ACM Computing Surveys (CSUR)
Sampling alignment structure under a Bayesian translation model
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Phrase translation probabilities with ITG priors and smoothing as learning objective
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A phrase-based alignment model for natural language inference
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Selective phrase pair extraction for improved statistical machine translation
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
A Gibbs sampler for phrasal synchronous grammar induction
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
A Bayesian model of syntax-directed tree to string grammar induction
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Unsupervised syntactic alignment with inversion transduction grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Training phrase translation models with leaving-one-out
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bayesian synchronous tree-substitution grammar induction and its application to sentence compression
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Discriminative modeling of extraction sets for machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
The RWTH Aachen machine translation system for WMT 2010
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Computing optimal alignments for the IBM-3 translation model
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Learning probabilistic synchronous CFGs for phrase-based translation
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
An unsupervised model for joint phrase alignment and extraction
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning hierarchical translation structure with linguistic annotations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Gappy phrasal alignment by agreement
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
The RWTH Aachen machine translation system for WMT 2011
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
From n-gram-based to CRF-based translation models
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Fast generation of translation forest for large-scale SMT discriminative training
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast inference in phrase extraction models with belief propagation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Leave-one-out phrase model training for large-scale deployment
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Unsupervised sub-tree alignment for tree-to-tree translation
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We investigate why weights from generative models underperform heuristic estimates in phrase-based machine translation. We first propose a simple generative, phrase-based model and verify that its estimates are inferior to those given by surface statistics. The performance gap stems primarily from the addition of a hidden segmentation variable, which increases the capacity for overfitting during maximum likelihood training with EM. In particular, while word level models benefit greatly from re-estimation, phrase-level models do not: the crucial difference is that distinct word alignments cannot all be correct, while distinct segmentations can. Alternate segmentations rather than alternate alignments compete, resulting in increased deter-minization of the phrase table, decreased generalization, and decreased final BLEU score. We also show that interpolation of the two methods can result in a modest increase in BLEU score.