Exact correlation between actual and estimated errors in discrete classification

  • Authors:
  • Ulisses M. Braga-Neto;Edward R. Dougherty

  • Affiliations:
  • Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA and Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ, USA an ...

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.10

Visualization

Abstract

Discrete classification problems are important in pattern recognition applications. The most often used discrete classification rule is the discrete histogram rule. In this letter we provide exact expressions for the correlation coefficient between the actual error and the resubstitution and leave-one-out cross-validation error estimators for the discrete histogram rule. We show with an example that correlations between actual and estimated errors are generally poor, and that in fact leave-one-out cross-validation can display negative correlation when sample sizes are small and classifier complexity is large. We observe that correlation decreases with increasing classifier complexity and increasing sample size does not necessarily produce an increase in correlation. The exact expressions given here can be computed reasonably fast for given sample size, dimensionality, and model parameters, which is useful because, as also illustrated in this letter, Monte-Carlo approximations of the correlation coefficient are generally poor, even at a large number of simulated data sets.