Effects of Many Feature Candidates in Feature Selection and Classification

  • Authors:
  • Helene Schulerud;Fritz Albregtsen

  • Affiliations:
  • -;-

  • Venue:
  • Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problems of analyzing many feature candidates when performing feature selection and error estimation on a limited data set. A Monte Carlo study of multivariate normal distributed data has been performed to illustrate the problems. Two feature selection methods are tested: Plus-1-Minus-1 and Sequential Forward Floating Selection. The simulations demonstrate that in order to find the correct features, the number of features initially analyzed is an important factor, besides the number of samples. Moreover, the sufficient ratio of number of training samples to feature candidates is not a constant. It depends on the number of feature candidates, training samples and the Mahalanobis distance between the classes. The two feature selection methods analyzed gave the same result. Furthermore, the simulations demonstrate how the leave-one-out error estimate can be a highly biased error estimate when feature selection is performed on the same data as the error estimation. It may even indicate complete separation of the classes, while no real difference between the classes exists.