Diversify and combine: improving word alignment for machine translation on low-resource languages

Authors:
Bing Xiang;Yonggang Deng;Bowen Zhou
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Year:
2010

Citing 13
Cited 3

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improving bitext word alignments via syntax-based reordering of English

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A maximum entropy word aligner for Arabic-English machine translation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A maximum entropy approach to combining word alignments

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Using a dependency parser to improve SMT for subject-object-verb languages

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Optimizing word alignment combination for phrase table training

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Confidence measure for word alignment

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Improved word alignment with statistics and linguistic heuristics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

A power mean based algorithm for combining multiple alignment tables

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Word alignment combination over multiple word segmentation

HLT-SS '11 Proceedings of the ACL 2011 Student Session
The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuristics. We demonstrate this approach on an English-to-Pashto translation task by combining the alignments obtained from syntactic reordering, stemming, and partial words. The combined alignment outperforms the baseline alignment, with significantly higher F-scores and better translation performance.