CONDOCS: a concept-based document categorization system using concept-probability vector with thesaurus

  • Authors:
  • Hyun-Kyu Kang;Jeong-Oog Lee;Heung Seok Jeon;Myeong-Cheol Ko;Doo Hyun Kim;Ryum-Duck Oh;Wonseog Kang

  • Affiliations:
  • Dept. of Computer Science, Konkuk University, Chungju-city, Chungbuk, Korea;Dept. of Computer Science, Konkuk University, Chungju-city, Chungbuk, Korea;Dept. of Computer Science, Konkuk University, Chungju-city, Chungbuk, Korea;Dept. of Computer Science, Konkuk University, Chungju-city, Chungbuk, Korea;Dept. of Internet & Multimedia, Konkuk University, Seoul, Korea;Dept. of Computer Science, Chungju National University, Chungju-city, Chungbuk, Korea;Dept. of Computer Engineering Education, Andong National University, Kyungpook, Korea

  • Venue:
  • AIS'04 Proceedings of the 13th international conference on AI, Simulation, and Planning in High Autonomy Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional approaches in document categorization use the term-based classification techniques to classify the documents. The techniques, for enormous terms, are not effective to the applications that need speedy response or not much space. This paper presents an effective concept-based document categorization system, which can efficiently classify Korean documents through the thesaurus tool. The thesaurus tool is the information extractor that acquires the meanings of document terms from the thesaurus. It supports effective document categorization with the acquired meanings. The system uses the concept-probability vector to represent the meanings of the terms. Because the category of the document depends on the meanings than the terms, even though the size of the vector is small, the system can classify the document without degradation of the performance. The system uses the small concept-probability vector so that it can save the time and space for document categorization. The experimental results suggest that the presented system with the thesaurus tool can effectively classify the documents. The results show that even though the system uses the contracted vector for document categorization, the performance of the system is not degraded.