Knowledge transfer across multilingual corpora via latent topics

  • Authors:
  • Wim De Smet;Jie Tang;Marie-Francine Moens

  • Affiliations:
  • K.U. Leuven, Leuven, Belgium;Tsinghua University, Beijing, China;K.U. Leuven, Leuven, Belgium

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores bridging the content of two different languages via latent topics. Specifically, we propose a unified probabilistic model to simultaneously model latent topics from bilingual corpora that discuss comparable content and use the topics as features in a cross-lingual, dictionary-less text categorization task. Experimental results on multilingual Wikipedia data show that the proposed topic model effectively discovers the topic information from the bilingual corpora, and the learned topics successfully transfer classification knowledge to other languages, for which no labeled training data are available.