Parallel text categorization for multi-dimensional data

Authors:
Verayuth Lertnattee;Thanaruk Theeramunkong
Affiliations:
Faculty of Pharmacy, Silpakorn University, Nakorn Pathom, Thailand;Sirindhorn International Institutue of Technology, Thammasat University, Pathum Thani
Venue:
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Year:
2004

Citing 5
Cited 0

Clustering and classification of large document bases in a parallel environment

Journal of the American Society for Information Science
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A parallel learning algorithm for text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Effect of term distributions on centroid-based text categorization

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a multi-dimensional category model (MDCM) for classifying multi-dimensional text collection. We can parallel and distribute the process of text classification in separately on each dimension. With this model, performance of classifiers improves in both accuracy and time complexity. For classification accuracy, some benefits can be obtained. Classifiers learn from larger training documents with a small number of classes on each dimension. We can select the best classifier for each dimension and combine the results from them. For time complexity, the learning and classifying phases can be in parallel and distributed manner. The efficiency of MDCM is investigated on drug information data set which assigns topics in monographs in the first dimension and primary therapeutic classes in the second dimension. The experimental results show that parallel text classification on MDCM performs better than flat model in both accuracy and time complexity.