A Rough Set-Based Hybrid Method to Text Categorization

  • Authors:
  • Yongguang Bao;Satoshi Aoyama;Kazutaka Yamada;Naohiro Ishii;Xiaoyong Du

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a hybrid text categorization method based on Rough Sets theory. A central problem in good text Classification for information filtering and retrieval (IF/IR) is the high dimensionality of the data. It may contain many unnecessary and irrelevant features. To cope with this problem, we propose a hybrid technique using Latent Semantic Indexing (LSI) and Rough Sets theory (RS) to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. This simplifies the creation of knowledge-based IF/IR systems, speeds up their operation, and allows easy editing of the rule bases employed. Besides, we generate several knowledge base instead of one knowledge base for the classification of new object, hoping that the combination of answers of the multiple knowledge bases result in better performance. Multiple knowledge bases can be formulated precisely and in a unified way within the framework of RS. This paper describes the proposed technique, discusses the integration of a keyword acquisition algorithm, Latent Semantic Indexing (LSI) with Rough Set-based rule generate algorithm, and provides experimental results. The test results show the hybrid method is better than the previous rough set-based approach.