A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hi-index | 0.00 |
We analyzes category score algorithms for k-NN classifier found in the literature, including majority voting algorithm (MVA), simple sum algorithm (SSA). MVA and SSA are two mainly used algorithms to estimate score for candidate categories in k-NN classifier systems. Based on the hypothesis that utilization of internal relation between documents and categories could improve system performance, two new weighting score models: concept-based weighting (CBW) score model and term independence-based weighting (IBW) score model are proposed. Our experimental results confirm our hypothesis and show that in the term of precision average IBW and CBW are better than the other score models, while SSA is higher than MVA. According to macro-average F1 CBW performs best. Rocchio-based algorithm (RBA) always performs worst.