HMM Word and Phrase Alignment for Statistical Machine Translation

Authors:
Yonggang Deng;W. Byrne
Affiliations:
IBM, Yorktown Heights;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 9

Efficient path counting transducers for minimum bayes-risk decoding of statistical machine translation lattices

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Hierarchical phrase-based translation grammars extracted from alignment posterior probabilities

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars

Computational Linguistics
Explicit length modelling for statistical machine translation

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Hierarchical phrase-based translation representations

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Explicit length modelling for statistical machine translation

Pattern Recognition
A hidden Markov model for collaborative filtering

MIS Quarterly
N-gram posterior probability confidence measures for statistical machine translation: an empirical study

Machine Translation
Maximum-entropy word alignment and posterior-based phrase extraction for machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Estimation and alignment procedures for word and phrase alignment hidden Markov models (HMMs) are developed for the alignment of parallel text. The development of these models is motivated by an analysis of the desirable features of IBM Model 4, one of the original and most effective models for word alignment. These models are formulated to capture the desirable aspects of Model 4 in an HMM alignment formalism. Alignment behavior is analyzed and compared to human-generated reference alignments, and the ability of these models to capture different types of alignment phenomena is evaluated. In analyzing alignment performance, Chinese-English word alignments are shown to be comparable to those of IBM Model 4 even when models are trained over large parallel texts. In translation performance, phrase-based statistical machine translation systems based on these HMM alignments can equal and exceed systems based on Model 4 alignments, and this is shown in Arabic-English and Chinese-English translation. These alignment models can also be used to generate posterior statistics over collections of parallel text, and this is used to refine and extend phrase translation tables with a resulting improvement in translation quality.