Class-Based language models for chinese-english parallel corpus

  • Authors:
  • Junfei Guo;Juan Liu;Michael Walsh;Helmut Schmid

  • Affiliations:
  • School of Computer, Wuhan University, China,Institute for Natural Language Processing, University of Stuttgart, Germany;School of Computer, Wuhan University, China;Institute for Natural Language Processing, University of Stuttgart, Germany;Institute for Natural Language Processing, University of Stuttgart, Germany

  • Venue:
  • CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses using novel class-based language models on parallel corpora, focusing specifically on English and Chinese languages. We find that the perplexity of Chinese is generally much higher than English and discuss the possible reasons. We demonstrate the relative effectiveness of using class-based models over the modified Kneser-Ney trigram model for our task. We also introduce a rare events clustering and a polynomial discounting mechanism, which is shown to improve results. Our experimental results on parallel corpora indicate that the improvement due to classes are similar for English and Chinese. This suggests that class-based language models should be used for both languages.