What types of word alignment improve statistical machine translation?

Authors:
Patrik Lambert;Simon Petitrenaud;Yanjun Ma;Andy Way
Affiliations:
LIUM, LUNAM Université, Univeristy of Le Mans, Le Mans Cedex 9, France 72085;LIUM, LUNAM Université, Univeristy of Le Mans, Le Mans Cedex 9, France 72085;Baidu Inc., Beijing, China;Applied Language Solutions, Delph, UK
Venue:
Machine Translation
Year:
2012

Citing 19
Cited 1

Random fuzzy variables of second order and applications to statistical inference

Information Sciences: an International Journal - Fuzzy random variables
A systematic comparison of various statistical alignment models

Computational Linguistics
Models of translational equivalence among words

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Going beyond AER: an extensive analysis of word alignments and their impact on MT

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A discriminative framework for bilingual word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
N-gram-based Machine Translation

Computational Linguistics
Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics
Improving statistical MT by coupling reordering and decoding

Machine Translation
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Discriminative alignment training without annotated data for machine translation

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Discriminative word alignment by linear modeling

Computational Linguistics
Better hypothesis testing for statistical machine translation: controlling for optimizer instability

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Improving phrase-based statistical translation through combination of word alignments

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Maximum-entropy word alignment and posterior-based phrase extraction for machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In most statistical machine translation (SMT) systems, bilingual segments are extracted via word alignment. However, there is a need for systematic study as to what alignment characteristics can benefit MT under specific experimental settings such as the type of MT system, the language pair or the type or size of the corpus. In this paper we perform, in each of these experimental settings, a statistical analysis of the data and study the sample correlation coefficients between a number of alignment or phrase table characteristics and variables such as the phrase table size, the number of untranslated words or the BLEU score. We report results for two different SMT systems (a phrase-based and an n-gram-based system) on Chinese-to-English FBIS and BTEC data, and Spanish-to-English European Parliament data. We find that the alignment characteristics which help in translation greatly depend on the MT system and on the corpus size. We give alignment hints to improve BLEU score, depending on the SMT system used and the type of corpus. For example, for phrase-based SMT, dense alignments are required with larger corpora, especially on the target side, while with smaller corpora, more precise, sparser alignments are better, especially on the source side. Avoiding some long-distance crossing links may also improve BLEU score with small corpora. We take these conclusions into account to modify two types of alignment systems, and get 1 to 1.6 % relative improvements in BLEU score on two held-out corpora, although the improved system is different in each corpus.