Feature salience definition and estimation and its use in feature subset selection

  • Authors:
  • G. Richards;K. Brazier;W. Wang

  • Affiliations:
  • School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK. Tel.: +44 (0)1603 592308/ Fax: +44 (0)1603 593344/ E-mail: {gr,kb,wjw}@cmp.uea.ac.uk;School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK. Tel.: +44 (0)1603 592308/ Fax: +44 (0)1603 593344/ E-mail: {gr,kb,wjw}@cmp.uea.ac.uk;School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK. Tel.: +44 (0)1603 592308/ Fax: +44 (0)1603 593344/ E-mail: {gr,kb,wjw}@cmp.uea.ac.uk

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe novel feature subset selection methods, based on the estimation of feature salience i.e. the quantification of the relative importance of individual features, in the presence of other features, for determining the classes of records in a dataset. We present a definition of what we mean by feature salience and a method for estimating this feature salience. Five synthetic datasets were used to demonstrate the utility of the salience estimation technique. It was found that the estimation techniques produced good approximations to the calculated saliencies in most cases. The use of feature salience as the basis of three methods of feature subset selection is described. These methods were evaluated on real world data sets by constructing classifiers using all features and comparing these with classifiers constructed using only a selected subset of features. It was found that the results compared well with other state of the art techniques and that the methods were simpler to implement and significantly faster to execute. On average, applying our best feature subset selection method resulted in trees that used only 49% of the features used by trees constructed with the full set of features. This reduction in number of features used was associated with a 1% improvement in classifier accuracy.