Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Cross-language text classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An EM Based Training Algorithm for Cross-Language Text Categorization
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Co-clustering based classification for out-of-domain documents
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Using the Web as corpus for self-training text categorization
Information Retrieval
Multilingual text classification using ontologies
ECIR'07 Proceedings of the 29th European conference on IR research
Semi-supervised document classification with a mislabeling error model
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
A document is known by the company it keeps: neighborhood consensus for short text categorization
Language Resources and Evaluation
Hi-index | 0.00 |
Cross-language text classification (CLTC) aims to take advantage of existing training data from one language to construct a classifier for another language. In addition to the expected translation issues, CLTC is also complicated by the cultural distance between both languages, which causes that documents belonging to the same category concern very different topics. This paper proposes a re-classification method which purpose is to reduce the errors caused by this phenomenon by considering information from the own target language documents. Experimental results in a news corpus considering three pairs of languages and four categories demonstrated the appropriateness of the proposed method, which could improve the initial classification accuracy by up to 11%.