Cross-lingual text classification with model translation and document translation

  • Authors:
  • Teng-Sheng Moh;Zhang Zhang

  • Affiliations:
  • San Jose State University, San Jose, CA;San Jose State University, San Jose, CA

  • Venue:
  • Proceedings of the 50th Annual Southeast Regional Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text classification assumes that the documents are in the same language, so when a classifier tries to categorize these documents in different languages, the trained model in mono-language will not work. The most direct solution is to translate all the documents in other languages into one language with the machine translator. Another approach is to translate the features extracted from one language into a second language and use them to classify the second language. In this paper, the authors propose a new method that adopts both the model translation and the document translation methods. This new method can take advantage of the best of the functionality between both the document translation and model translation methods.