A Rough Set-Based Hybrid Method to Text Categorization

Authors:
Yongguang Bao;Satoshi Aoyama;Kazutaka Yamada;Naohiro Ishii;Xiaoyong Du
Affiliations:
-;-;-;-;-
Venue:
WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
Year:
2001

Citing 0
Cited 8

Combining Multiple K-Nearest Neighbor Classifiers for Text Classification by Reducts

DS '02 Proceedings of the 5th International Conference on Discovery Science
A rough-fuzzy document grading system for customized text information retrieval

Information Processing and Management: an International Journal
An efficient feature ranking measure for text categorization

Proceedings of the 2008 ACM symposium on Applied computing
A new customized document categorization scheme using rough membership

Applied Soft Computing
Rough Set Based Social Networking Framework to Retrieve User-Centric Information

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
A rough set-based case-based reasoner for text categorization

International Journal of Approximate Reasoning
Classification by multiple reducts-kNN with confidence

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Control of variables in reducts - kNN classification with confidence

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a hybrid text categorization method based on Rough Sets theory. A central problem in good text Classification for information filtering and retrieval (IF/IR) is the high dimensionality of the data. It may contain many unnecessary and irrelevant features. To cope with this problem, we propose a hybrid technique using Latent Semantic Indexing (LSI) and Rough Sets theory (RS) to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. This simplifies the creation of knowledge-based IF/IR systems, speeds up their operation, and allows easy editing of the rule bases employed. Besides, we generate several knowledge base instead of one knowledge base for the classification of new object, hoping that the combination of answers of the multiple knowledge bases result in better performance. Multiple knowledge bases can be formulated precisely and in a unified way within the framework of RS. This paper describes the proposed technique, discusses the integration of a keyword acquisition algorithm, Latent Semantic Indexing (LSI) with Rough Set-based rule generate algorithm, and provides experimental results. The test results show the hybrid method is better than the previous rough set-based approach.