Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Improving query translation for cross-language information retrieval using statistical models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Implementation of the SMART Information Retrieval System
Implementation of the SMART Information Retrieval System
Computational Linguistics - Special issue on web as corpus
Word translation disambiguation using Bilingual Bootstrapping
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A maximum coherence model for dictionary-based cross-language information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language text classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An EM Based Training Algorithm for Cross-Language Text Categorization
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
A study of statistical models for query translation: finding a good unit of translation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using KCCA for Japanese---English cross-language information retrieval and document classification
Journal of Intelligent Information Systems
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic acquisition of chinese–english parallel corpus from the web
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hi-index | 0.00 |
Cross language text categorization is the task of exploiting labelled documents in a source language (e.g. English) to classify documents in a target language (e.g. Chinese). In this paper, we focus on investigating the use of a bilingual lexicon for cross language text categorization. To this end, we propose a novel refinement framework for cross language text categorization. The framework consists of two stages. In the first stage, a cross language model transfer is proposed to generate initial labels of documents in target language. In the second stage, expectation maximization algorithm based on naive Bayes model is introduced to yield resulting labels of documents. Preliminary experimental results on collected corpora show that the proposed framework is effective.