Cross-lingual text classification with model translation and document translation

Authors:
Teng-Sheng Moh;Zhang Zhang
Affiliations:
San Jose State University, San Jose, CA;San Jose State University, San Jose, CA
Venue:
Proceedings of the 50th Annual Southeast Regional Conference
Year:
2012

Citing 12
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
WordNet: a lexical database for English

Communications of the ACM
Making large-scale support vector machine learning practical

Advances in kernel methods
Information Science

Information Science
Feature selection methods for text classification

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Can chinese web pages be classified with english data source?

Proceedings of the 17th international conference on World Wide Web
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A framework of feature selection methods for text categorization

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross language text classification by model translation and semi-supervised learning

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Cross lingual text classification by mining multilingual topics from wikipedia

Proceedings of the fourth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification assumes that the documents are in the same language, so when a classifier tries to categorize these documents in different languages, the trained model in mono-language will not work. The most direct solution is to translate all the documents in other languages into one language with the machine translator. Another approach is to translate the features extracted from one language into a second language and use them to classify the second language. In this paper, the authors propose a new method that adopts both the model translation and the document translation methods. This new method can take advantage of the best of the functionality between both the document translation and model translation methods.