Word alignment combination over multiple word segmentation

Authors:
Ning Xi;Guangchao Tang;Boyuan Li;Yinggong Zhao
Affiliations:
Nanjing University, Nanjing, China;Nanjing University, Nanjing, China;Nanjing University, Nanjing, China;Nanjing University, Nanjing, China
Venue:
HLT-SS '11 Proceedings of the ACL 2011 Student Session
Year:
2011

Citing 12
Cited 1

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Bilingually motivated domain-adapted word segmentation for statistical machine translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Using a maximum entropy model to build segmentation lattices for MT

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved statistical machine translation by multiple Chinese word segmentation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Optimizing Chinese word segmentation for machine translation performance

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Better word alignments with supervised ITG models

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Confidence measure for word alignment

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Unsupervised tokenization for machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Diversify and combine: improving word alignment for machine translation on low-resource languages

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Joint tokenization and translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Discriminative word alignment by linear modeling

Computational Linguistics

Enhancing statistical machine translation with character alignment

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a new word alignment combination approach on language pairs where one language has no explicit word boundaries. Instead of combining word alignments of different models (Xiang et al., 2010), we try to combine word alignments over multiple monolingually motivated word segmentation. Our approach is based on link confidence score defined over multiple segmentations, thus the combined alignment is more robust to inappropriate word segmentation. Our combination algorithm is simple, efficient, and easy to implement. In the Chinese-English experiment, our approach effectively improved word alignment quality as well as translation performance on all segmentations simultaneously, which showed that word alignment can benefit from complementary knowledge due to the diversity of multiple and monolingually motivated segmentations.