Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine

  • Authors:
  • Jooyong Shim;Insuk Sohn;Sujong Kim;Jae Won Lee;Paul E. Green;Changha Hwang

  • Affiliations:
  • Department of Applied Statistics, Catholic University of Daegu, Kyungbuk 712-702, Republic of Korea;Department of Statistics, Korea University, Seoul 136-701, Republic of Korea;Department of Biochemistry, College of Medicine, Hanyang University, Seoul 133-791, Republic of Korea;Department of Statistics, Korea University, Seoul 136-701, Republic of Korea;Transportation Research Institute, University of Michigan, Ann Arbor, 48109-2150, USA;Division of Information and Computer Science, Dankook University, Gyeonggido 448-701, Republic of Korea

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2009

Quantified Score

Hi-index 0.03

Visualization

Abstract

Due to recent interest in the analysis of DNA microarray data, new methods have been considered and developed in the area of statistical classification. In particular, according to the gene expression profile of existing data, the goal is to classify the sample into a relevant diagnostic category. However, when classifying outcomes into certain cancer types, it is often the case that some genes are not important, while some genes are more important than others. A novel algorithm is presented for selecting such relevant genes referred to as marker genes for cancer classification. This algorithm is based on the Support Vector Machine (SVM) and Supervised Weighted Kernel Clustering (SWKC). To investigate the performance of this algorithm, the methods were applied to a simulated data set and some real data sets. For comparison, some other well-known methods such as Prediction Analysis of Microarrays (PAM), Support Vector Machine-Recursive Feature Elimination (SVM-RFE), and a Structured Polychotomous Machine (SPM) were considered. The experimental results indicate that the proposed SWKC/SVM algorithm is conceptually much simpler and performs more efficiently than other existing methods used in identifying marker genes for cancer classification. Furthermore, the SWKC/SVM algorithm has the advantage that it requires much less computing time compared with the other existing methods.