Exploiting variant corpora for machine translation

Authors:
Michael Paul;Eiichiro Sumita
Affiliations:
National Institute of Information and Communications Technology and ATR Spoken Language Communication Research Labs, Keihanna Science City, Kyoto;National Institute of Information and Communications Technology and ATR Spoken Language Communication Research Labs, Keihanna Science City, Kyoto
Venue:
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Year:
2006

Citing 2
Cited 0

Using language and translation models to select the best among outputs from multiple MT systems

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes the usage of variant corpora, i.e., parallel text corpora that are equal in meaning but use different ways to express content, in order to improve corpus-based machine translation. The usage of multiple training corpora of the same content with different sources results in variant models that focus on specific linguistic phenomena covered by the respective corpus. The proposed method applies each variant model separately resulting in multiple translation hypotheses which are selectively combined according to statistical models. The proposed method outperforms the conventional approach of merging all variants by reducing translation ambiguities and exploiting the strengths of each variant model.