Generalized rough sets based feature selection

  • Authors:
  • Mohamed Quafafou;Moussa Boussouf

  • Affiliations:
  • IRIN, University of Nantes, 2 rue de la Houssiniere, BP 92208 - 44322, Nantes Cedex 03, France. E-mail: {quafafou, boussouf}@irin.univ-nantes.fr;IRIN, University of Nantes, 2 rue de la Houssiniere, BP 92208 - 44322, Nantes Cedex 03, France. E-mail: {quafafou, boussouf}@irin.univ-nantes.fr

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of feature subset selection can be defined as theselection of a relevant subset of features which allows a learningalgorithm to induce small high-accuracy models. This problem is ofprimary important because irrelevant and redundant features maydegrade the learner speed, especially in the context of highdimensionality, and reduce both the accuracy and comprehensibilityof the induced model. Two main approaches have been developed, thefirst one is algorithm-independent (filter approach) whichconsiders only the data, when the second approach which isalgorithm-dependent takes into account both the data and a givenlearning algorithm (wrapper approach). Recent work was developed tostudy the interest of the rough set theory and more particularlyits notions of reducts and core to deal with the problem of featuresubset selection. Different methods were proposed to selectfeatures using both the core and the reduct concepts, whereas otherresearches show that useful feature subsets do not necessarilycontain all features in cores. In this paper, we underline the factthat rough set theory is concerned with deterministic analysis ofattribute dependencies which are at the basis of the two notions ofreduct and core. We extend the notion of dependency which allows tofind both deterministic and non-deterministic dependencies. A newnotion of strong reducts is then introduced and leads to thedefinition of strong feature subsets (SFS). The interest of SFS isillustrated by the improvement of the accuracy of C4.5 onreal-world datasets. Our study shows that generally thehighest-accuracy-subset is not the best one as regards to thefilter criteria. The highest accuracy subset is found by the newapproach with minimum cost. The contribution of this work is fourfolds : (1) analysis of feature subset selection in the rough setscontext, (2) introduction of new definitions based on a generalizedrough set theory, i.e., \alpha-RST, (3) reformulation of theselection problem, (4) description of a hybrid method combiningcombining both the filter and the wrapper approaches.