Resampling methods for quality assessment of classifier performance and optimal number of features

Authors:
Raquel Fandos;Christian Debes;Abdelhak M. Zoubir
Affiliations:
-;-;-
Venue:
Signal Processing
Year:
2013

Citing 29
Cited 1

Bootstrap Techniques for Error Estimation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Floating search methods in feature selection

Pattern Recognition Letters
Support-Vector Networks

Machine Learning
Feature Selection: Evaluation, Application, and Small Sample Performance

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Comparison of Non-Parametric Methods for Assessing Classifier Performance in Terms of ROC Parameters

AIPR '04 Proceedings of the 33rd Applied Imagery Pattern Recognition Workshop
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
A Direct Method of Nonparametric Measurement Selection

IEEE Transactions on Computers
Particle swarm optimization for parameter determination and feature selection of support vector machines

Expert Systems with Applications: An International Journal
Cross-validation and bootstrapping are unreliable in small sample classification

Pattern Recognition Letters
A bootstrap-based aggregate classifier for model-based clustering

Computational Statistics
A new feature selection method for Gaussian mixture clustering

Pattern Recognition
Performance of feature-selection methods in the classification of high-dimension data

Pattern Recognition
Feature selection with dynamic mutual information

Pattern Recognition
Advances in Feature Selection with Mutual Information

Similarity-Based Clustering
Feature subset selection in large dimensionality domains

Pattern Recognition
Conditional confidence intervals for classification error rate

Computational Statistics & Data Analysis
Indexes for three-class classification performance assessment: an empirical comparison

IEEE Transactions on Information Technology in Biomedicine
Impact of error estimation on feature selection

Pattern Recognition
Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods

Computational Statistics & Data Analysis
Local-Learning-Based Feature Selection for High-Dimensional Data Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discriminative semi-supervised feature selection via manifold regularization

IEEE Transactions on Neural Networks
On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
An empirical comparison of nine pattern classifiers

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
On the mean accuracy of statistical pattern recognizers

IEEE Transactions on Information Theory
Comparisons of a neural network and a nearest-neighbor classifier via the numeric handprint recognition problem

IEEE Transactions on Neural Networks
Feature Selection Using Probabilistic Prediction of Support Vector Regression

IEEE Transactions on Neural Networks

Compressive sensing and adaptive direct sampling in hyperspectral imaging

Digital Signal Processing

Quantified Score

Hi-index	0.08

Visualization

Abstract

We address two fundamental design issues of a classification system: the choice of the classifier and the dimensionality of the optimal feature subset. Resampling techniques are applied to estimate both the probability distribution of the misclassification rate (or any other figure of merit of a classifier) subject to the size of the feature set, and the probability distribution of the optimal dimensionality given a classification system and a misclassification rate. The latter allows for the estimation of confidence intervals for the optimal feature set size. Based on the former, a quality assessment for the classifier performance is proposed. Traditionally, the comparison of classification systems is accomplished for a fixed feature set. However, a different set may provide different results. The proposed method compares the classifiers independently of any pre-selected feature set. The algorithms are tested on 80 sets of synthetic examples and six standard databases of real data. The simulated data results are verified by an exhaustive search of the optimum and by two feature selection algorithms for the real data sets.