An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Simple KNN Algorithm for Text Categorization
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
The VLDB Journal — The International Journal on Very Large Data Bases
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Document Categorization by Term Association
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Enhancing Text Classification Using Synopses Extraction
WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
Abordagem não supervisionada para extração de conceitos a partir de textos
Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
A coarse-to-fine framework to efficiently thwart plagiarism
Pattern Recognition
Expert Systems with Applications: An International Journal
Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet
Expert Systems with Applications: An International Journal
Conceptual modeling of cardinality constraints in social publishing
International Journal of Intelligent Systems
Hi-index | 0.00 |
In this paper, we propose a new algorithm, which incorporates the relationships of concept-based thesauri into the document categorization using the k-NN classifier (k-NN). k-NN is one of the most popular document categorization methods because it shows relatively good performance in spite of its simplicity. However, it significantly degrades precision when ambiguity arises, i.e., when there exist more than one candidate category to which a document can be assigned. To remedy the drawback, we employ concept-based thesauri in the categorization. Employing the thesaurus entails structuring categories into hierarchies, since their structure needs to be conformed to that of the thesaurus for capturing relationships between categories. By referencing various relationships in the thesaurus corresponding to the structured categories, k-NN can be prominently improved, removing the ambiguity. In this paper, we first perform the document categorization by using k-NN and then employ the relationships to reduce the ambiguity. Experimental results show that this method improves the precision of k-NN up to 13.86% without compromising its recall.