Exact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model

  • Authors:
  • Amin Zollanvari;Ulisses Braga-Neto;Edward R. Dougherty

  • Affiliations:
  • Children's Hospital Informatics Program at Harvard-MIT Division of Health Science and Technology, Boston, MA 02115, United States and Brigham and Women's Hospital, Boston, MA 02115, United States ...;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States and Translational Genomics Research Institute (TGEN), Phoenix, AZ 85004, United St ...

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper provides exact analytical expressions for the bias, variance, and RMS for the resubstitution and leave-one-out error estimators in the case of linear discriminant analysis (LDA) in the univariate heteroskedastic Gaussian model. Neither the variances nor the sample sizes for the two classes need be the same. The generality of heteroskedasticity (unequal variances) is a fundamental feature of the work presented in this paper, which distinguishes it from past work. The expected resubstitution and leave-one-out errors are represented by probabilities involving bivariate Gaussian distributions. Their second moments and cross-moments with the actual error are represented by 3- and 4-variate Gaussian distributions. From these, the bias, deviation variance, and RMS for resubstitution and leave-one-out as estimators of the actual error can be computed. The RMS expressions are applied to the determination of sample size and illustrated in biomarker classification.