Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Advanced learning algorithms for cross-language patent retrieval and classification
Information Processing and Management: an International Journal
Can chinese web pages be classified with english data source?
Proceedings of the 17th international conference on World Wide Web
Bilingual topic aspect classification with a few training examples
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual query classification: a preliminary study
Proceedings of the 2nd ACM workshop on Improving non english web searching
Cross-language query classification using web search for exogenous knowledge
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Automatic term categorization by extracting knowledge from the Web
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Transferring naive bayes classifiers for text classification
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Co-training for cross-lingual sentiment classification
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Using Nearest Neighbor Information to Improve Cross-Language Text Classification
MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Transfer Learning beyond Text Classification
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Multilingual text classification using ontologies
ECIR'07 Proceedings of the 29th European conference on IR research
A refinement framework for cross language text categorization
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Cross-language text classification using structural correspondence learning
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using information from the target language to improve crosslingual text classification
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Cross-lingual text categorization: Conquering language boundaries in globalized environments
Information Processing and Management: an International Journal
Cross-Lingual Adaptation Using Structural Correspondence Learning
ACM Transactions on Intelligent Systems and Technology (TIST)
Bilingual co-training for sentiment classification of chinese product reviews
Computational Linguistics
Bi-weighting domain adaptation for cross-language text classification
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Active learning for cross language text categorization
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Cross-lingual genre classification
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Generalized canonical correlation analysis for disparate data fusion
Pattern Recognition Letters
A document is known by the company it keeps: neighborhood consensus for short text categorization
Language Resources and Evaluation
A Comparative Study of Cross-Lingual Sentiment Classification
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
Efficiency investigation of manifold matching for text document classification
Pattern Recognition Letters
Exploiting poly-lingual documents for improving text categorization effectiveness
Decision Support Systems
Hi-index | 0.00 |
Due to the globalization on the Web, many companies and institutions need to efficiently organize and search repositories containing multilingual documents. The management of these heterogeneous text collections increases the costs significantly because experts of different languages are required to organize these collections. Cross-Language Text Categorization can provide techniques to extend existing automatic classification systems in one language to new languages without requiring additional intervention of human experts. In this paper we propose a learning algorithm based on the EM scheme which can be used to train text classifiers in a multilingual environment. In particular, in the proposed approach, we assume that a predefined category set and a collection of labeled training data is available for a given language L驴. A classifier for a different language L驴 is trained by translating the available labeled training set for L驴 to L驴 and by using an additional set of unlabeled documents from L驴. This technique allows us to extract correct statistical properties of the language L驴 which are not completely available in automatically translated examples, because of the different characteristics of language L驴 and of the approximation of the translation process. Our experimental results show that the performance of the proposed method is very promising when applied on a test document set extracted from newsgroups in English and Italian.