A refinement framework for cross language text categorization

Authors:
Ke Wu;Bao-Liang Lu
Affiliations:
Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Venue:
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Year:
2008

Citing 13
Cited 1

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Improving query translation for cross-language information retrieval using statistical models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Implementation of the SMART Information Retrieval System

Implementation of the SMART Information Retrieval System
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Word translation disambiguation using Bilingual Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A maximum coherence model for dictionary-based cross-language information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language text classification

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An EM Based Training Algorithm for Cross-Language Text Categorization

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
A study of statistical models for query translation: finding a good unit of translation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using KCCA for Japanese---English cross-language information retrieval and document classification

Journal of Intelligent Information Systems
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic acquisition of chinese–english parallel corpus from the web

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross language text categorization is the task of exploiting labelled documents in a source language (e.g. English) to classify documents in a target language (e.g. Chinese). In this paper, we focus on investigating the use of a bilingual lexicon for cross language text categorization. To this end, we propose a novel refinement framework for cross language text categorization. The framework consists of two stages. In the first stage, a cross language model transfer is proposed to generate initial labels of documents in target language. In the second stage, expectation maximization algorithm based on naive Bayes model is introduced to yield resulting labels of documents. Preliminary experimental results on collected corpora show that the proposed framework is effective.