Less biased measurement of feature selection benefits

  • Authors:
  • Juha Reunanen

  • Affiliations:
  • ABB, Web Imaging Systems, Helsinki, Finland

  • Venue:
  • SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In feature selection, classification accuracy typically needs to be estimated in order to guide the search towards the useful subsets. It has earlier been shown [1] that such estimates should not be used directly to determine the optimal subset size, or the benefits due to choosing the optimal set. The reason is a phenomenon called overfitting, thanks to which these estimates tend to be biased. Previously, an outer loop of cross-validation has been suggested for fighting this problem. However, this paper points out that a straightforward implementation of such an approach still gives biased estimates for the increase in accuracy that could be obtained by selecting the best-performing subset. In addition, two methods are suggested that are able to circumvent this problem and give virtually unbiased results without adding almost any computational overhead.