Parallel text categorization for multi-dimensional data

  • Authors:
  • Verayuth Lertnattee;Thanaruk Theeramunkong

  • Affiliations:
  • Faculty of Pharmacy, Silpakorn University, Nakorn Pathom, Thailand;Sirindhorn International Institutue of Technology, Thammasat University, Pathum Thani

  • Venue:
  • PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a multi-dimensional category model (MDCM) for classifying multi-dimensional text collection. We can parallel and distribute the process of text classification in separately on each dimension. With this model, performance of classifiers improves in both accuracy and time complexity. For classification accuracy, some benefits can be obtained. Classifiers learn from larger training documents with a small number of classes on each dimension. We can select the best classifier for each dimension and combine the results from them. For time complexity, the learning and classifying phases can be in parallel and distributed manner. The efficiency of MDCM is investigated on drug information data set which assigns topics in monographs in the first dimension and primary therapeutic classes in the second dimension. The experimental results show that parallel text classification on MDCM performs better than flat model in both accuracy and time complexity.