Joint sampling distribution between actual and estimated classification errors for linear discriminant analysis

Authors:
Amin Zollanvari;Ulisses M. Braga-Neto;Edward R. Dougherty
Affiliations:
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX and Translational Genomics Institute, Phoenix, AZ
Venue:
IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
Year:
2010

Citing 6
Cited 6

Recent advances in error rate estimation

Pattern Recognition Letters
Is cross-validation better than resubstitution for ranking genes?

Bioinformatics
Is cross-validation valid for small-sample microarray classification?

Bioinformatics
Algorithms for Recognizing Contour-Traced Handprinted Characters

IEEE Transactions on Computers
Decorrelation of the true and estimated classifier errors in high-dimensional settings

EURASIP Journal on Bioinformatics and Systems Biology
On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers

Pattern Recognition

On the relevance of linear discriminative features

Information Sciences: an International Journal
Optimal mean-square-error calibration of classifier error estimators under Bayesian models

Pattern Recognition
Noisy data elimination using mutual k-nearest neighbor for classification mining

Journal of Systems and Software
Exact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model

Pattern Recognition
The reliability of estimated confidence intervals for classification error rates when only a single sample is available

Pattern Recognition
Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Error estimation must be used to find the accuracy of a designed classifier, an issue that is critical in biomarker discovery for disease diagnosis and prognosis in genomics and proteomics. This paper presents, for what is believed to be the first time, the analytical formulation for the joint sampling distribution of the actual and estimated errors of a classification rule. The analysis presented here concerns the linear discriminant analysis (LDA) classification rule and the resubstitution and leave-one-out error estimators, under a general parametric Gaussian assumption. Exact results are provided in the univariate case, and a simple method is suggested to obtain an accurate approximation in the multivariate case. It is also shown how these results can be applied in the computation of condition bounds and the regression of the actual error, given the observed error estimate. In contrast to asymptotic results, the analysis presented here is applicable to finite training data. In particular, it applies in the small-sample settings commonly found in genomics and proteomics applications. Numerical examples, which include parameters estimated from actual microarray data, illustrate the analysis throughout.