An EM Based Training Algorithm for Cross-Language Text Categorization

Authors:
Leonardo Rigutini;Marco Maggini;Bing Liu
Affiliations:
Università di Siena;Università di Siena;University of Illinois at Chicago
Venue:
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2005

Citing 0
Cited 27

Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Advanced learning algorithms for cross-language patent retrieval and classification

Information Processing and Management: an International Journal
Can chinese web pages be classified with english data source?

Proceedings of the 17th international conference on World Wide Web
Bilingual topic aspect classification with a few training examples

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual query classification: a preliminary study

Proceedings of the 2nd ACM workshop on Improving non english web searching
Cross-language query classification using web search for exogenous knowledge

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Automatic term categorization by extracting knowledge from the Web

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Transferring naive bayes classifiers for text classification

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Using Nearest Neighbor Information to Improve Cross-Language Text Classification

MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Transfer Learning beyond Text Classification

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Multilingual text classification using ontologies

ECIR'07 Proceedings of the 29th European conference on IR research
A refinement framework for cross language text categorization

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using information from the target language to improve crosslingual text classification

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Cross-lingual text categorization: Conquering language boundaries in globalized environments

Information Processing and Management: an International Journal
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
Bilingual co-training for sentiment classification of chinese product reviews

Computational Linguistics
Bi-weighting domain adaptation for cross-language text classification

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Active learning for cross language text categorization

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Cross-lingual genre classification

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Generalized canonical correlation analysis for disparate data fusion

Pattern Recognition Letters
A document is known by the company it keeps: neighborhood consensus for short text categorization

Language Resources and Evaluation
A Comparative Study of Cross-Lingual Sentiment Classification

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Cross-lingual web spam classification

Proceedings of the 22nd international conference on World Wide Web companion
Efficiency investigation of manifold matching for text document classification

Pattern Recognition Letters
Exploiting poly-lingual documents for improving text categorization effectiveness

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the globalization on the Web, many companies and institutions need to efficiently organize and search repositories containing multilingual documents. The management of these heterogeneous text collections increases the costs significantly because experts of different languages are required to organize these collections. Cross-Language Text Categorization can provide techniques to extend existing automatic classification systems in one language to new languages without requiring additional intervention of human experts. In this paper we propose a learning algorithm based on the EM scheme which can be used to train text classifiers in a multilingual environment. In particular, in the proposed approach, we assume that a predefined category set and a collection of labeled training data is available for a given language L驴. A classifier for a different language L驴 is trained by translating the available labeled training set for L驴 to L驴 and by using an additional set of unlabeled documents from L驴. This technique allows us to extract correct statistical properties of the language L驴 which are not completely available in automatically translated examples, because of the different characteristics of language L驴 and of the approximation of the translation process. Our experimental results show that the performance of the proposed method is very promising when applied on a test document set extracted from newsgroups in English and Italian.