Data complexity assessment in undersampled classification of high-dimensional biomedical data

Authors:
R. Baumgartner;R. L. Somorjai
Affiliations:
Institute for Biodiagnostics, National Research Council Canada, 435 Ellice Avenue, Winnipeg, Man., Canada R3B 1Y6;Institute for Biodiagnostics, National Research Council Canada, 435 Ellice Avenue, Winnipeg, Man., Canada R3B 1Y6
Venue:
Pattern Recognition Letters
Year:
2006

Citing 4
Cited 7

The nature of statistical learning theory

The nature of statistical learning theory
Complexity Measures of Supervised Classification Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
An optimization criterion for generalized discriminant analysis on undersampled problems

IEEE Transactions on Pattern Analysis and Machine Intelligence

Domains of Competence of Artificial Neural Networks Using Measures of Separability of Classes

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Domains of competence of fuzzy rule based classification systems with data complexity measures: A case of study using a fuzzy hybrid genetic based machine learning method

Fuzzy Sets and Systems
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

IEEE Transactions on Evolutionary Computation
Shared domains of competence of approximate learning models using measures of separability of classes

Information Sciences: an International Journal
Linear separability and classification complexity

Expert Systems with Applications: An International Journal
Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification

Pattern Recognition
Analysis of data complexity measures for classification

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.13

Visualization

Abstract

Regularized linear classifiers have been successfully applied in undersampled, i.e. small sample size/high dimensionality biomedical classification problems. Additionally, a design of data complexity measures was proposed in order to assess the competence of a classifier in a particular context. Our work was motivated by the analysis of ill-posed regression problems by Elden and the interpretation of linear discriminant analysis as a mean square error classifier. Using Singular Value Decomposition analysis, we define a discriminatory power spectrum and show that it provides useful means of data complexity assessment for undersampled classification problems. In five real-life biomedical data sets of increasing difficulty we demonstrate how the data complexity of a classification problem can be related to the performance of regularized linear classifiers. We show that the concentration of the discriminatory power manifested in the discriminatory power spectrum is a deciding factor for the success of the regularized linear classifiers in undersampled classification problems. As a practical outcome of our work, the proposed data complexity assessment may facilitate the choice of a classifier for a given undersampled problem.