Boosting statistical machine translation by lemmatization and linear interpolation

  • Authors:
  • Ruiqiang Zhang;Eiichiro Sumita

  • Affiliations:
  • National Institute of Information and Communications Technology and ATR Spoken Language Communication Research Laboratories, Soraku-gun, Kyoto, Japan;National Institute of Information and Communications Technology and ATR Spoken Language Communication Research Laboratories, Soraku-gun, Kyoto, Japan

  • Venue:
  • ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data sparseness is one of the factors that degrade statistical machine translation (SMT). Existing work has shown that using morpho-syntactic information is an effective solution to data sparseness. However, fewer efforts have been made for Chinese-to-English SMT with using English morpho-syntactic analysis. We found that while English is a language with less inflection, using English lemmas in training can significantly improve the quality of word alignment that leads to yield better translation performance. We carried out comprehensive experiments on multiple training data of varied sizes to prove this. We also proposed a new effective linear interpolation method to integrate multiple homologous features of translation models.