An empirical evaluation of stop word removal in statistical machine translation

  • Authors:
  • Chong Tze Yuang;Rafael E. Banchs;Chng Eng Siong

  • Affiliations:
  • Nanyang Technological University, Singapore;Institute for Infocomm Research, A*STAR, Singapore;Nanyang Technological University, Singapore

  • Venue:
  • EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we evaluate the possibility of improving the performance of a statistical machine translation system by relaxing the complexity of the translation task by removing the most frequent and predictable terms from the target language vocabulary. Afterwards, the removed terms are inserted back in the relaxed output by using an n-gram based word predictor. Empirically, we have found that when these words are omitted from the text, the perplexity of the text decreases, which may imply the reduction of confusion in the text. We conducted some machine translation experiments to see if this perplexity reduction produced a better translation output. While the word prediction results exhibits 77% accuracy in predicting 40% of the most frequent words in the text, the perplexity reduction did not help to produce better translations.