Feature selection algorithms to find strong genes

Authors:
Paulo J. S. Silva;Ronaldo F. Hashimoto;Seungchan Kim;Junior Barrera;Leô/nidas O. Brandã/o;Edward Suh;Edward R. Dougherty
Affiliations:
Department of Computer Science, Institute of Math./ Statistics--IME, University of Sã/o Paulo, Rua do Matao 1010, 05508-090 Sao Paulo, Brazil;Department of Computer Science, Institute of Math./ Statistics--IME, University of Sã/o Paulo, Rua do Matao 1010, 05508-090 Sao Paulo, Brazil;Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-4470, USA;Department of Computer Science, Institute of Math./ Statistics--IME, University of Sã/o Paulo, Rua do Matao 1010, 05508-090 Sao Paulo, Brazil;Department of Computer Science, Institute of Math./ Statistics--IME, University of Sã/o Paulo, Rua do Matao 1010, 05508-090 Sao Paulo, Brazil;Division of Computational Biology, Center for Information Technology, National Institutes of Health, Bethesda, MD 20892-4470, USA;Department of Electrical Engineering, Texas A&M University, College Station, TX 77840, USA
Venue:
Pattern Recognition Letters
Year:
2005

Citing 6
Cited 6

Floating search methods in feature selection

Pattern Recognition Letters
The nature of statistical learning theory

The nature of statistical learning theory
Adaptive floating search methods in feature selection

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Feature Selection Via Mathematical Programming

INFORMS Journal on Computing
Is cross-validation better than resubstitution for ranking genes?

Bioinformatics
Is cross-validation valid for small-sample microarray classification?

Bioinformatics

A novel approach to feature extraction from classification models based on information gene pairs

Pattern Recognition
Applying genetic algorithms and support vector machines to the gene selection problem

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - VIII Brazilian Symposium on Neural Networks
Performance of feature-selection methods in the classification of high-dimension data

Pattern Recognition
Analytic center of spherical shells and its application to analytic center machine

Computational Optimization and Applications
Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction

Computers in Biology and Medicine
Feature selection for support vector machines with RBF kernel

Artificial Intelligence Review

Quantified Score

Hi-index	0.10

Visualization

Abstract

The cDNA microarray technology allows us to estimate the expression of thousands of genes of a given tissue. It is natural then to use such information to classify different cell states, like healthy or diseased, or one particular type of cancer or another. However, usually the number of microarray samples is very small and leads to a classification problem with only tens of samples and thousands of features. Recently, Kim et al. proposed to use a parameterized distribution based on the original sample set as a way to attenuate such difficulty. Genes that contribute to good classifiers in such setting are called strong. In this paper, we investigate how to use feature selection techniques to speed up the quest for strong genes. The idea is to use a feature selection algorithm to filter the gene set considered before the original strong feature technique, that is based on a combinatorial search. The filtering helps us to find very good strong gene sets, without resorting to super computers. We have tested several filter options and compared the strong genes obtained with the ones got by the original full combinatorial search.