A method for feature selection on microarray data using support vector machine

  • Authors:
  • Xiao Bing Huang;Jian Tang

  • Affiliations:
  • Computer Science Department, Memorial University of Newfoundland, St. John’s, NL, Canada;Computer Science Department, Memorial University of Newfoundland, St. John’s, NL, Canada

  • Venue:
  • DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The data collected from a typical microarray experiment usually consists of tens of samples and thousands of genes (i.e., features). Usually only a small subset of features is relevant and non-redundant to differentiate the samples. Identifying an optimal subset of relevant genes is crucial for accurate classification of samples. In this paper, we propose a method for relevant gene subset selection for microarray gene expression data. Our method is based on gap tolerant classifier, a variation of support vector machine, and uses a hill-climbing search strategy. Unlike most other hill-climbing approaches, where classification accuracies are used as a criterion for feature selection, the proposed method uses a mixture of accuracy and SVM margin to select features. Our experimental results show that this strategy is effective both in selecting relevant and in eliminating redundant features.