Optimal number of features as a function of sample size for various classification rules

Authors:
Jianping Hua;Zixiang Xiong;James Lowey;Edward Suh;Edward R. Dougherty
Affiliations:
Department of Electrical Engineering, Texas A&M University College Station, TX 77843, USA;Department of Electrical Engineering, Texas A&M University College Station, TX 77843, USA;Translational Genomics Research Institute Phoenix, AZ 85004, USA;Translational Genomics Research Institute Phoenix, AZ 85004, USA;Department of Electrical Engineering, Texas A&M University College Station, TX 77843, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 17

Evolving Feature Selection

IEEE Intelligent Systems
Optimal convex error estimators for classification

Pattern Recognition
Normalization benefits microarray-based classification

EURASIP Journal on Bioinformatics and Systems Biology
Decorrelation of the true and estimated classifier errors in high-dimensional settings

EURASIP Journal on Bioinformatics and Systems Biology
The peaking phenomenon in the presence of feature-selection

Pattern Recognition Letters
Performance of feature-selection methods in the classification of high-dimension data

Pattern Recognition
Parallel Selection of Informative Genes for Classification

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Short-term emotion assessment in a recall paradigm

International Journal of Human-Computer Studies
A minimum classification error framework suitable for multicriteria gene selection: discovery of differentially methylated genes in small B-cell lymphomas

International Journal of Computational Intelligence in Bioinformatics and Systems Biology
Editorial: The fundamental role of pattern recognition for gene-expression/microarray data in bioinformatics

Pattern Recognition
Automatic depth profiling of 2D cinema- and photographic images

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data

Computational Statistics & Data Analysis
Classification of healthy and abnormal swallows based on accelerometry and nasal airflow signals

Artificial Intelligence in Medicine
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
An LZ78 based string kernel

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
A survey of methods for data fusion and system adaptation using autonomic nervous system responses in physiological computing

Interacting with Computers
Optimal classifiers with minimum expected error within a Bayesian framework-Part I: Discrete and Gaussian models

Pattern Recognition

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Results: Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. Availability: For the companion website, please visit http://public.tgen.org/tamu/ofs/ Contact: e-dougherty@ee.tamu.edu