Informative gene selection and tumor classification by null space LDA for microarray data

  • Authors:
  • Feng Yue;Kuanquan Wang;Wangmeng Zuo

  • Affiliations:
  • Biocomputing Research Center, The School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;Biocomputing Research Center, The School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;Biocomputing Research Center, The School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

  • Venue:
  • ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

DNA microarray technology can monitor thousands of genes in a single experiment. One important application of this high-throughput gene expression data is to classify samples into known categories. Since the number of gene often exceeds the number of samples, classical classification methods do not work well under this circumstance. Furthermore, there are many irrelevant and redundant genes which will decrease classification accuracy, thus a gene selection process is necessary. More accurate classification result using these selected genes is expected. A novel informative gene selection and sample classification method for gene expression data is proposed in this paper. This method is based on Linear Discriminant Analysis (LDA) in the regular space and the null space of within-class scatter matrix. By recursively filtering genes which have smaller coefficient in the optimal projection basis vectors, the remaining genes are more and more informative. The results of experiments on leukemia dataset and the colon dataset show that genes in this subset have much less correlations and more discriminative power compared to those selected by classical methods.