A refinement framework for cross language text categorization

  • Authors:
  • Ke Wu;Bao-Liang Lu

  • Affiliations:
  • Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross language text categorization is the task of exploiting labelled documents in a source language (e.g. English) to classify documents in a target language (e.g. Chinese). In this paper, we focus on investigating the use of a bilingual lexicon for cross language text categorization. To this end, we propose a novel refinement framework for cross language text categorization. The framework consists of two stages. In the first stage, a cross language model transfer is proposed to generate initial labels of documents in target language. In the second stage, expectation maximization algorithm based on naive Bayes model is introduced to yield resulting labels of documents. Preliminary experimental results on collected corpora show that the proposed framework is effective.