CONDOCS: a concept-based document categorization system using concept-probability vector with thesaurus

Authors:
Hyun-Kyu Kang;Jeong-Oog Lee;Heung Seok Jeon;Myeong-Cheol Ko;Doo Hyun Kim;Ryum-Duck Oh;Wonseog Kang
Affiliations:
Dept. of Computer Science, Konkuk University, Chungju-city, Chungbuk, Korea;Dept. of Computer Science, Konkuk University, Chungju-city, Chungbuk, Korea;Dept. of Computer Science, Konkuk University, Chungju-city, Chungbuk, Korea;Dept. of Computer Science, Konkuk University, Chungju-city, Chungbuk, Korea;Dept. of Internet & Multimedia, Konkuk University, Seoul, Korea;Dept. of Computer Science, Chungju National University, Chungju-city, Chungbuk, Korea;Dept. of Computer Engineering Education, Andong National University, Kyungpook, Korea
Venue:
AIS'04 Proceedings of the 13th international conference on AI, Simulation, and Planning in High Autonomy Systems
Year:
2004

Citing 8
Cited 0

A statistical similarity measure

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Intelligent high-volume text processing using shallow, domain-specific techniques

Text-based intelligent systems
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional approaches in document categorization use the term-based classification techniques to classify the documents. The techniques, for enormous terms, are not effective to the applications that need speedy response or not much space. This paper presents an effective concept-based document categorization system, which can efficiently classify Korean documents through the thesaurus tool. The thesaurus tool is the information extractor that acquires the meanings of document terms from the thesaurus. It supports effective document categorization with the acquired meanings. The system uses the concept-probability vector to represent the meanings of the terms. Because the category of the document depends on the meanings than the terms, even though the size of the vector is small, the system can classify the document without degradation of the performance. The system uses the small concept-probability vector so that it can save the time and space for document categorization. The experimental results suggest that the presented system with the thesaurus tool can effectively classify the documents. The results show that even though the system uses the contracted vector for document categorization, the performance of the system is not degraded.