Information storage and retrieval
Information storage and retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Simple KNN Algorithm for Text Categorization
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
The VLDB Journal — The International Journal on Very Large Data Bases
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Document Categorization by Term Association
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Enhancing Text Classification Using Synopses Extraction
WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
The Role of Different Thesauri Terms and Captions in Automated Subject Classification
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
A document classification and retrieval system for R&D in semiconductor industry - A hybrid approach
Expert Systems with Applications: An International Journal
Information Processing and Management: an International Journal
An automatically constructed thesaurus for neural network based document categorization
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Automatic thesaurus construction for spam filtering using revised back propagation neural network
Expert Systems with Applications: An International Journal
Information Sciences: an International Journal
A proposed method of local feature-weighting to improve predictions of basic nearest neighbor rule
ASC '07 Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing
Hi-index | 0.01 |
In this paper, we propose a new algorithm, which incorporates the relationships of concept-based thesauri into the document categorization using the k-NN classifier (k-NN). k-NN is one of the most popular document categorization methods because it shows relatively good performance in spite of its simplicity. However, it significantly degrades precision when ambiguity arises, i.e., when there exist more than one candidate category to which a document can be assigned. To remedy the drawback, we employ concept-based thesauri in the categorization. Employing the thesaurus entails structuring categories into hierarchies, since their structure needs to be conformed to that of the thesaurus for capturing relationships between categories. By referencing various relationships in the thesaurus corresponding to the structured categories, k-NN can be prominently improved, removing the ambiguity. In this paper, we first perform the document categorization by using k-NN and then employ the relationships to reduce the ambiguity. Experimental results show that this method improves the precision of k-NN up to 13.86% without compromising its recall.