SyMGiza++: symmetrized word alignment models for statistical machine translation

Authors:
Marcin Junczys-Dowmunt;Arkadiusz Sza$#322/
Affiliations:
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Pozna$#324/, Poland;Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Pozna$#324/, Poland
Venue:
SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
Year:
2011

Citing 10
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
An evaluation exercise for word alignment

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Improved word alignment using a symmetric lexicon model

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Symmetric word alignments for statistical machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Parallel implementations of word alignment tool

SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

SyMGiza++ -- a tool that computes symmetric word alignment models with the capability to take advantage of multi-processor systems -- is presented. A series of fairly simple modifications to the original IBM/Giza++ word alignment models allows to update the symmetrized models between chosen iterations of the original training algorithms. We achieve a relative alignment quality improvement of more than 17% compared to Giza++ and MGiza++ on the standard Canadian Hansards task, while maintaining the speed improvements provided by the capability of parallel computations of MGiza++. Furthermore, the alignment models are evaluated in the context of phrase-based statistical machine translation, where a consistent improvement measured in BLEU scores can be observed when SyMGiza++ is used instead of Giza++ or MGiza++.