Using most similarity tree based clustering to select the top most discriminating genes for cancer detection

  • Authors:
  • Xinguo Lu;Yaping Lin;Xiaolin Yang;Lijun Cai;Haijun Wang;Gustaph Sanga

  • Affiliations:
  • College of Computer and Communication, Hunan University, Changsha, China;College of Computer and Communication, Hunan University, Changsha, China;College of Computer and Communication, Hunan University, Changsha, China;College of Computer and Communication, Hunan University, Changsha, China;College of Computer and Communication, Hunan University, Changsha, China;College of Computer and Communication, Hunan University, Changsha, China

  • Venue:
  • ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The development of DNA array technology makes it feasible to cancer detection with DNA array expression data. However, the research is usually plagued with the problem of “curse of dimensionality”, and the capability of discrimination is weakened seriously by the noise and the redundancy that are abundant in these datasets. This paper proposes a hybrid gene selection method for cancer detection based on clustering of most similarity tree (CMST). By this method, a number of non-redundant clusters and the most discriminating gene from each cluster can be acquired. These discriminating genes are then used for training of a perceptron that produces a very efficient classification. In CMST, the Gap statistic is used to determine the optimal similarity measure λ and the number of clusters. And a gene selection method with optimal self-adaptive CMST(OS-CMST) for cancer detection is presented. The experiments show that the gene pattern pre-processing based on CMST not only reduces the dimensionality of the attributes significantly but also improves the classification rate effectively in cancer detection. And the selection scheme based on OS-CMST can acquire the top most discriminating genes.