Considerations of sample and feature size

Authors:
D. Foley
Affiliations:
-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 8

On the relevance of some spectral and temporal patterns for vowel classification

Speech Communication
A new classification model with simple decision rule for discovering optimal feature gene pairs

Computers in Biology and Medicine
On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers

Pattern Recognition
A methodology for comparing classification methods through the assessment of model stability and validity in variable selection

Decision Support Systems
Tensor distance based multilinear globality preserving embedding: A unified tensor based dimensionality reduction framework for image and video classification

Expert Systems with Applications: An International Journal
Exact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model

Pattern Recognition
Automation of combustion monitoring in boilers using discriminant radial basis network

International Journal of Artificial Intelligence and Soft Computing
Integrated Fisher linear discriminants: An empirical study

Pattern Recognition

Quantified Score

Hi-index	754.84

Visualization

Abstract

In many practical pattern-classification problems the underlying probability distributions are not completely known. Consequently, the classification logic must be determined on the basis of vector samples gathered for each class. Although it is common knowledge that the error rate on the design set is a biased estimate of the true error rate of the classifier, the amount of bias as a function of sample size per class and feature size has been an open question. In this paper, the design-set error rate for a two-class problem with multivariate normal distributions is derived as a function of the sample size per class(N)and dimensionality(L). The design-set error rate is compared to both the corresponding Bayes error rate and the test-set error rate. It is demonstrated that the design-set error rate is an extremely biased estimate of either the Bayes or test-set error rate if the ratio of samples per class to dimensions(N/L)is less than three. Also the variance of the design-set error rate is approximated by a function that is bounded by1/8N.