Parallel implementations of word alignment tool

Authors:
Qin Gao;Stephan Vogel
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Year:
2008

Citing 6
Cited 39

A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions

Improving word alignment with language model based confidence scores

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Experiments in morphosyntactic processing for translating to and from German

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
SMT and SPE machine translation systems for WMT'09

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
An improved statistical transfer system for French--English machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Data cleaning for word alignment

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Transferring structural markup across translations using multilingual alignment and projection

Proceedings of the 10th annual joint conference on Digital libraries
Active learning-based elicitation for semi-supervised word alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Active semi-supervised learning for improving word alignment

ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
LetsMT! --Online Platform for Sharing Training Data and Building User Tailored Machine Translation

Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
A semi-supervised word alignment algorithm with partial manual alignments

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Improved features and grammar selection for syntax-based MT

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Exodus: exploring SMT for EU institutions

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
More linguistic annotation for statistical machine translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
LIUM SMT machine translation system for WMT 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
EMDC: a semi-supervised approach for word alignment

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Measuring historical word sense variation

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
An algorithm for unsupervised transliteration mining with an application to word alignment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Corpus expansion for statistical machine translation with semantic role label substitution rules

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Two ways to use a noisy parallel news corpus for improving statistical machine translation

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
An Expectation Maximization algorithm for textual unit alignment

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Utilizing target-side semantic role labels to assist hierarchical phrase-based machine translation

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The UZH system combination system for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Investigations on translation model adaptation using monolingual data

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
CMU syntax-based machine translation at WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The LIGA (LIG/LIA) machine translation system for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
LIUM's SMT machine translation systems for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
From n-gram-based to CRF-based translation models

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Unsupervised alignment for segmental-based language understanding

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation

Machine Translation
SyMGiza++: symmetrized word alignment models for statistical machine translation

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
Enabling users to create their own web-based machine translation engine

Proceedings of the 21st international conference companion on World Wide Web
ACCURAT toolkit for multi-level alignment and information extraction from comparable corpora

ACL '12 Proceedings of the ACL 2012 System Demonstrations
The CMU-avenue French-English translation system

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Syntax-aware phrase-based statistical machine translation: system description

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
LIUM's SMT machine translation systems for WMT 2012

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
GHKM rule extraction and scope-3 parsing in Moses

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Generalizing sampling-based multilingual alignment

Machine Translation
Maximum-entropy word alignment and posterior-based phrase extraction for machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Training word alignment models on large corpora is a very time-consuming processes. This paper describes two parallel implementations of GIZA++ that accelerate this word alignment process. One of the implementations runs on computer clusters, the other runs on multi-processor system using multi-threading technology. Results show a near-linear speed-up according to the number of CPUs used, and alignment quality is preserved.