Construction of supervised and unsupervised learning systems for multilingual text categorization

  • Authors:
  • Chung-Hong Lee;Hsin-Chang Yang

  • Affiliations:
  • Department of Electrical Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan;Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.05

Visualization

Abstract

Due to the availability of a huge amount of textual data from a variety of sources, users of internationally distributed information regions need effective methods and tools that enable them to discover, retrieve and categorize relevant information, in whatever language and form it may have been stored. This drives a convergence of numerous interests from diverse research communities focusing on the issues related to multilingual text categorization. In this work, we implemented and measured the performance of the leading supervised and unsupervised approaches for multilingual text categorization. We selected support vector machines (SVM) as representative of supervised techniques as well as latent semantic indexing (LSI) and self-organizing maps (SOM) techniques as our selective ones of unsupervised methods for system implementation. The preliminary results show that our platform models including both supervised and unsupervised learning methods have the potentials for multilingual text categorization.