An Intelligent Information System for Organizing Online Text Documents

  • Authors:
  • Han-Joon Kim;Sang-Goo Lee

  • Affiliations:
  • The University of Seoul, Department of Electrical and Computer Engineering, Korea;Seoul National University, School of Computer Science and Engineering, Korea

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.02

Visualization

Abstract

This paper describes an intelligent information system for effectively managing huge amounts of online text documents (such as Web documents) in a hierarchical manner. The organizational capabilities of this system are able to evolve semi-automatically with minimal human input. The system starts with an initial taxonomy in which documents are automatically categorized, and then evolves so as to provide a good indexing service as the document collection grows or its usage changes. To this end, we propose a series of algorithms that utilize text-mining technologies such as document clustering, document categorization, and hierarchy reorganization. In particular, clustering and categorization algorithms have been intensively studied in order to provide evolving facilities for hierarchical structures and categorization criteria. Through experiments using the Reuters-21578 document collection, we evaluate the performance of the proposed clustering and categorization methods by comparing them to those of well-known conventional methods.