A discriminative global training algorithm for statistical MT

Authors:
Christoph Tillmann;Tong Zhang
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, N.Y.;Yahoo! Research, New York City, N.Y.
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 10
Cited 22

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A localized prediction model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A maximum entropy word aligner for Arabic-English machine translation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Local phrase reordering models for statistical machine translation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

A block bigram prediction model for statistical machine translation

ACM Transactions on Speech and Language Processing (TSLP)
Statistical machine translation

ACM Computing Surveys (CSUR)
Online large-margin training of syntactic and structural translation features

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Generalizing local translation models

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Comparing reordering constraints for SMT using efficient Bleu oracle computation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Using paraphrases for parameter tuning in statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Mixture-model adaptation for SMT

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Context-aware discriminative phrase selection for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Effective use of linguistic and contextual information for statistical machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Accuracy-based scoring for DOT: towards direct error minimization for data-oriented translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Assessing phrase-based translation models with oracle decoding

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Discriminative feature-tied mixture modeling for statistical machine translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Integrating source-language context into phrase-based statistical machine translation

Machine Translation
Syntactic discriminative language model rerankers for statistical machine translation

Machine Translation
Fast generation of translation forest for large-scale SMT discriminative training

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hope and fear for discriminative training of statistical translation models

The Journal of Machine Learning Research
Joint feature selection in distributed stochastic learning for large-scale discriminative training in SMT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Optimization strategies for online large-margin learning in machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Oracle decoding as a new way to analyze phrase-based machine translation

Machine Translation

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a novel training algorithm for a linearly-scored block sequence translation model. The key component is a new procedure to directly optimize the global scoring function used by a SMT decoder. No translation, language, or distortion model probabilities are used as in earlier work on SMT. Therefore our method, which employs less domain specific knowledge, is both simpler and more extensible than previous approaches. Moreover, the training procedure treats the decoder as a black-box, and thus can be used to optimize any decoding scheme. The training algorithm is evaluated on a standard Arabic-English translation task.