The fitness-rough: A new attribute reduction method based on statistical and rough set theory

Authors:
Yun-Huoy Choo;Azuraliza Abu Bakar;Abdul Razak Hamdan
Affiliations:
Department of Science and System Management, Faculty of Information Science and Technology, The National University of Malaysia, 43600 Bangi, Selangor, Malaysia;(Correspd. Tel.: +60 389216748/ Fax: +60 38925 6732/ E-mail: aab@ftsm.ukm.my) Department of Science and System Management, Faculty of Information Science and Technology, The National University of ...;Department of Science and System Management, Faculty of Information Science and Technology, The National University of Malaysia, 43600 Bangi, Selangor, Malaysia
Venue:
Intelligent Data Analysis
Year:
2008

Citing 12
Cited 1

Rough set algorithms in classification problem

Rough set methods and applications
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
Rough Sets: Mathematical Foundations

Rough Sets: Mathematical Foundations
Comparison of Two Families of Entropy-Based Classification Measures with and without Feature Selection

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 3 - Volume 3
Feature ranking in rough sets

AI Communications - Special issue on Artificial intelligence advances in China
An introduction to variable and feature selection

The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Theoretical Comparison between the Gini Index and Information Gain Criteria

Annals of Mathematics and Artificial Intelligence
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata

Data Mining and Knowledge Discovery
Feature selection based on relative attribute dependency: an experimental study

RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part I
Topic-specific text filtering based on multiple reducts

AIS-ADM 2005 Proceedings of the 2005 international conference on Autonomous Intelligent Systems: agents and Data Mining

A Rough-Apriori Technique in Mining Linguistic Association Rules

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Attribute reduction has become an important pre-processing task to reduce the complexity of the data mining task. Rough reducts, statistical methods and correlation-based methods have gradually contributed towards improving attribute reduction techniques to a certain extent. Statistical methods are generally lower in computational complexity compared to the rough reducts and the correlation-based methods, but many have proven that the rough reducts method is significant in reducing important attributes without causing too much information loss. Correlation-based methods on the other hand evaluate features as a subset instead of individual attribute. In this paper, we propose a combination of statistical and rough set methods to reduce important attributes in a simpler way while maintaining a lesser degree of information loss from the raw data. The fitness-rough method (FsR) indicates important attributes from raw data and it is further simplified to a more compact information table. Besides that, we have also looked into the problem of information loss in this method. Ten UCI machine learning datasets were used as testing sets on the proposed method as compared to the classical rough reducts (RR) method, the statistical entropy (ENT) method and the correlation-based feature selection (CFS) method. Experimental results show that our method has performed comparatively well with higher reduction strength and smaller rules set against the benchmarking methods, especially in medium size datasets. However, the FsR method is basically less efficient when used on mix-mode and nominal datasets as the non-quantitative attributes involved in these datasets are normally pre-categorised.