Resampling methods for quality assessment of classifier performance and optimal number of features

  • Authors:
  • Raquel Fandos;Christian Debes;Abdelhak M. Zoubir

  • Affiliations:
  • -;-;-

  • Venue:
  • Signal Processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.08

Visualization

Abstract

We address two fundamental design issues of a classification system: the choice of the classifier and the dimensionality of the optimal feature subset. Resampling techniques are applied to estimate both the probability distribution of the misclassification rate (or any other figure of merit of a classifier) subject to the size of the feature set, and the probability distribution of the optimal dimensionality given a classification system and a misclassification rate. The latter allows for the estimation of confidence intervals for the optimal feature set size. Based on the former, a quality assessment for the classifier performance is proposed. Traditionally, the comparison of classification systems is accomplished for a fixed feature set. However, a different set may provide different results. The proposed method compares the classifiers independently of any pre-selected feature set. The algorithms are tested on 80 sets of synthetic examples and six standard databases of real data. The simulated data results are verified by an exhaustive search of the optimum and by two feature selection algorithms for the real data sets.