Improved word alignment with statistics and linguistic heuristics

Authors:
Ulf Hermjakob
Affiliations:
University of Southern California, Marina del Rey, CA
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Year:
2009

Citing 14
Cited 10

A systematic comparison of various statistical alignment models

Computational Linguistics
wEBMT: developing and validating an example-based machine translation system using the world wide web

Computational Linguistics - Special issue on web as corpus
Models of translational equivalence among words

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sentence level discourse parsing using syntactic and lexical information

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Scalable inference and training of context-rich syntactic translation models

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A maximum entropy word aligner for Arabic-English machine translation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Soft syntactic constraints for word alignment through discriminative training

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics
Automatic generation of parallel treebanks

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Using syntax to improve word alignment precision for syntax-based machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation

Incorporating Linguistic Information to Statistical Word-Level Alignment

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Diversify and combine: improving word alignment for machine translation on low-resource languages

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Discriminative word alignment with a function word reordering model

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
An algorithm for unsupervised transliteration mining with an application to word alignment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Feature-rich language-independent syntax-based alignment for statistical machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment

Machine Translation
Learning better rule extraction with translation span alignment

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Re-training monolingual parser bilingually for syntactic SMT

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Improving function word alignment with frequency and syntactic information

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method to align words in a bitext that combines elements of a traditional statistical approach with linguistic knowledge. We demonstrate this approach for Arabic-English, using an alignment lexicon produced by a statistical word aligner, as well as linguistic resources ranging from an English parser to heuristic alignment rules for function words. These linguistic heuristics have been generalized from a development corpus of 100 parallel sentences. Our aligner, Ualign, outperforms both the commonly used GIZA++ aligner and the state-of-the-art LEAF aligner on F-measure and produces superior scores in end-to-end statistical machine translation, +1.3 Bleu points over GIZA++, and +0.7 over LEAF.