Segment choice models: feature-rich models for global distortion in statistical machine translation

Authors:
Roland Kuhn;Denis Yuen;Michel Simard;Patrick Paul;George Foster;Eric Joanis;Howard Johnson
Affiliations:
National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada;National Research Council of Canada, Gatineau, Québec, Canada
Venue:
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Year:
2006

Citing 13
Cited 2

An Iterative Growing and Pruning Algorithm for Classification Tree Design

IEEE Transactions on Pattern Analysis and Machine Intelligence
Self-organized language modeling for speech recognition

Readings in speech recognition
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A localized prediction model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Local phrase reordering models for statistical machine translation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A unigram orientation model for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
PORTAGE: with smoothed phrase tables and segment choice models

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation

PORTAGE: with smoothed phrase tables and segment choice models

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Learning linear ordering problems for better translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new approach to distortion (phrase reordering) in phrase-based machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These "segment choice" models (SCMs) can be trained on "segment-aligned" sentence pairs; they can be applied during decoding or rescoring. The approach yields a metric called "distortion perplexity" ("disperp") for comparing SCMs offline on test data, analogous to perplexity for language models. A decision-tree-based SCM is tested on Chinese-to-English translation, and outperforms a baseline distortion penalty approach at the 99% confidence level.