Cross language text classification by model translation and semi-supervised learning

  • Authors:
  • Lei Shi;Rada Mihalcea;Mingjun Tian

  • Affiliations:
  • Yahoo! Global R&D, Beijing, China;University of North Texas, Denton, TX;Yahoo! Global R&D, Beijing, China

  • Venue:
  • EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce a method that automatically builds text classifiers in a new language by training on already labeled data in another language. Our method transfers the classification knowledge across languages by translating the model features and by using an Expectation Maximization (EM) algorithm that naturally takes into account the ambiguity associated with the translation of a word. We further exploit the readily available unlabeled data in the target language via semi-supervised learning, and adapt the translated model to better fit the data distribution of the target language.