A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
Information Sciences: an International Journal
A subspace decision cluster classifier for text classification
Expert Systems with Applications: An International Journal
Text mining technique for chinese written judgment of criminal case
PAISI'10 Proceedings of the 2010 Pacific Asia conference on Intelligence and Security Informatics
Hi-index | 0.00 |
Associating documents to relevant categories is critical for effective document retrieval. Here, we compare the well-known k-Nearest Neighborhood (kNN) algorithm, the centroid-based classifier and the Highest Average Similarity over Retrieved Documents (HASRD) algorithm, for effective document categorization. We use various measures such as the micro and macro F1 values to evaluate their performance on the Reuters-21578 corpus. The empirical results show that kNN performs the best, followed by our adapted HASRD and the centroid-basedclassifier for common document categories, while the centroid-based classifier and kNNoutperform our adapted HASRD for rare document categories. Additionally, our study clearly indicates that each classifier performs optimally only when a suitable term weighting scheme is used. All these significant results lead to many exciting directions for future exploration.