Feature selection algorithms to find strong genes

  • Authors:
  • Paulo J. S. Silva;Ronaldo F. Hashimoto;Seungchan Kim;Junior Barrera;Leô/nidas O. Brandã/o;Edward Suh;Edward R. Dougherty

  • Affiliations:
  • Department of Computer Science, Institute of Math./ Statistics--IME, University of Sã/o Paulo, Rua do Matao 1010, 05508-090 Sao Paulo, Brazil;Department of Computer Science, Institute of Math./ Statistics--IME, University of Sã/o Paulo, Rua do Matao 1010, 05508-090 Sao Paulo, Brazil;Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-4470, USA;Department of Computer Science, Institute of Math./ Statistics--IME, University of Sã/o Paulo, Rua do Matao 1010, 05508-090 Sao Paulo, Brazil;Department of Computer Science, Institute of Math./ Statistics--IME, University of Sã/o Paulo, Rua do Matao 1010, 05508-090 Sao Paulo, Brazil;Division of Computational Biology, Center for Information Technology, National Institutes of Health, Bethesda, MD 20892-4470, USA;Department of Electrical Engineering, Texas A&M University, College Station, TX 77840, USA

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2005

Quantified Score

Hi-index 0.10

Visualization

Abstract

The cDNA microarray technology allows us to estimate the expression of thousands of genes of a given tissue. It is natural then to use such information to classify different cell states, like healthy or diseased, or one particular type of cancer or another. However, usually the number of microarray samples is very small and leads to a classification problem with only tens of samples and thousands of features. Recently, Kim et al. proposed to use a parameterized distribution based on the original sample set as a way to attenuate such difficulty. Genes that contribute to good classifiers in such setting are called strong. In this paper, we investigate how to use feature selection techniques to speed up the quest for strong genes. The idea is to use a feature selection algorithm to filter the gene set considered before the original strong feature technique, that is based on a combinatorial search. The filtering helps us to find very good strong gene sets, without resorting to super computers. We have tested several filter options and compared the strong genes obtained with the ones got by the original full combinatorial search.