Boosting statistical machine translation by lemmatization and linear interpolation

Authors:
Ruiqiang Zhang;Eiichiro Sumita
Affiliations:
National Institute of Information and Communications Technology and ATR Spoken Language Communication Research Laboratories, Soraku-gun, Kyoto, Japan;National Institute of Information and Communications Technology and ATR Spoken Language Communication Research Laboratories, Soraku-gun, Kyoto, Japan
Venue:
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Year:
2007

Citing 6
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
Applied morphological processing of English

Natural Language Engineering
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data sparseness is one of the factors that degrade statistical machine translation (SMT). Existing work has shown that using morpho-syntactic information is an effective solution to data sparseness. However, fewer efforts have been made for Chinese-to-English SMT with using English morpho-syntactic analysis. We found that while English is a language with less inflection, using English lemmas in training can significantly improve the quality of word alignment that leads to yield better translation performance. We carried out comprehensive experiments on multiple training data of varied sizes to prove this. We also proposed a new effective linear interpolation method to integrate multiple homologous features of translation models.