Relationship between the accuracy of classifier error estimation and complexity of decision boundary

Authors:
Esmaeil Atashpaz-Gargari;Chao Sima;Ulisses M. Braga-Neto;Edward R. Dougherty
Affiliations:
Department of Electrical and Computer Engineering, Texas A&M University, United States;Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ, United States;Department of Electrical and Computer Engineering, Texas A&M University, United States;Department of Electrical and Computer Engineering, Texas A&M University, United States and Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ, United States and ...
Venue:
Pattern Recognition
Year:
2013

Citing 4
Cited 0

Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Is cross-validation valid for small-sample microarray classification?

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Exact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Error estimation is a crucial part of classification methodology and it becomes problematic with small samples. We demonstrate here that the complexity of the decision boundary plays a key role on the performance of error estimation methods. First, a model is developed which quantifies the complexity of a classification problem purely in terms of the geometry of the decision boundary, without relying on the Bayes error. Then, this model is used in a simulation study to analyze the bias and root-mean-square (RMS) error of a few widely used error estimation methods relative to the complexity of the decision boundary: resubstitution, leave-one-out, 10-fold cross-validation with repetition, 0.632 bootstrap, and bolstered resubstitution, in two- and three-dimensional spaces. Each estimator is implemented with three classification rules: quadratic discriminant analysis (QDA), 3-nearest-neighbor (3NN) and two-layer neural network (NNet). The results show that all the estimation methods lose accuracy as complexity increases.