How much can we gain from supervised word alignment?

Authors:
Jinxi Xu;Jinying Chen
Affiliations:
Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 8
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using syntax to improve word alignment precision for syntax-based machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Better word alignments with supervised ITG models

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Discriminative modeling of extraction sets for machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Discriminative word alignment with a function word reordering model

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned Chinese-English corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsuper-vised method (GIZA++) is less than 1 point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions.