A statistical similarity measure
SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Intelligent high-volume text processing using shallow, domain-specific techniques
Text-based intelligent systems
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Hi-index | 0.00 |
Traditional approaches in document categorization use the term-based classification techniques to classify the documents. The techniques, for enormous terms, are not effective to the applications that need speedy response or not much space. This paper presents an effective concept-based document categorization system, which can efficiently classify Korean documents through the thesaurus tool. The thesaurus tool is the information extractor that acquires the meanings of document terms from the thesaurus. It supports effective document categorization with the acquired meanings. The system uses the concept-probability vector to represent the meanings of the terms. Because the category of the document depends on the meanings than the terms, even though the size of the vector is small, the system can classify the document without degradation of the performance. The system uses the small concept-probability vector so that it can save the time and space for document categorization. The experimental results suggest that the presented system with the thesaurus tool can effectively classify the documents. The results show that even though the system uses the contracted vector for document categorization, the performance of the system is not degraded.