Topic-specific text filtering based on multiple reducts

Authors:
Qiang Li;Jianhua Li
Affiliations:
Modern Communication Institute, Shanghai Jiaotong univ., Shanghai, P.R China;Modern Communication Institute, Shanghai Jiaotong univ., Shanghai, P.R China
Venue:
AIS-ADM 2005 Proceedings of the 2005 international conference on Autonomous Intelligent Systems: agents and Data Mining
Year:
2005

Citing 13
Cited 1

Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A vector space model for automatic indexing

Communications of the ACM
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Machine Learning

Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Neural Networks for Web Content Filtering

IEEE Intelligent Systems
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Rule Discovery from Databases with Decision Matrices

ISMIS '96 Proceedings of the 9th International Symposium on Foundations of Intelligent Systems
A Rough Set-Based Approach to Text Classification

RSFDGrC '99 Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing
TEA: A Text Analysis Tool for the Intelligent Text Document Filtering

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble

IEEE Transactions on Information Technology in Biomedicine

The fitness-rough: A new attribute reduction method based on statistical and rough set theory

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection is a very important step in text preprocessing, a good selected feature subset can get the same performance than using full features, at the same time, it reduced the learning time. To make our system fit for the application and to embed this model gateway for real-time text filtering, we need to further select more accurate features. In this paper, we proposed a new feature selection method based on Rough set theory. It generate several reducts, but the special point is that between these reducts there are no common attributes, so these attributes have more powerfully capability to classify new objects, especially for real data set in application. We choose two data sets to evaluate our feature selection method, one is a benchmark data set from UCI machine learning archive, and another is captured from Web. We use statistical classification methods to classify these objects, in the benchmark testing set, we get good precision with a single reduct, but in real date set, we get good precision with several reducts, and the data set is used in our system for topic-specific text filtering. Thus we conclude our method is very effective in application. In addition, we also conclude that SVM and VSM methods get better performance, while Naïve Bayes method get poor performance with the same selected features on non-balance data set.