Feature salience definition and estimation and its use in feature subset selection

Authors:
G. Richards;K. Brazier;W. Wang
Affiliations:
School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK. Tel.: +44 (0)1603 592308/ Fax: +44 (0)1603 593344/ E-mail: {gr,kb,wjw}@cmp.uea.ac.uk;School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK. Tel.: +44 (0)1603 592308/ Fax: +44 (0)1603 593344/ E-mail: {gr,kb,wjw}@cmp.uea.ac.uk;School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK. Tel.: +44 (0)1603 592308/ Fax: +44 (0)1603 593344/ E-mail: {gr,kb,wjw}@cmp.uea.ac.uk
Venue:
Intelligent Data Analysis
Year:
2006

Citing 7
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Use of Contextual Information for Feature Ranking and Discretization

IEEE Transactions on Knowledge and Data Engineering
A Practical Approach to Feature Selection

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
A Comparative Study of Feature-Salience Ranking Techniques

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe novel feature subset selection methods, based on the estimation of feature salience i.e. the quantification of the relative importance of individual features, in the presence of other features, for determining the classes of records in a dataset. We present a definition of what we mean by feature salience and a method for estimating this feature salience. Five synthetic datasets were used to demonstrate the utility of the salience estimation technique. It was found that the estimation techniques produced good approximations to the calculated saliencies in most cases. The use of feature salience as the basis of three methods of feature subset selection is described. These methods were evaluated on real world data sets by constructing classifiers using all features and comparing these with classifiers constructed using only a selected subset of features. It was found that the results compared well with other state of the art techniques and that the methods were simpler to implement and significantly faster to execute. On average, applying our best feature subset selection method resulted in trees that used only 49% of the features used by trees constructed with the full set of features. This reduction in number of features used was associated with a 1% improvement in classifier accuracy.