Towards improving fuzzy clustering using support vector machine: Application to gene expression data

  • Authors:
  • Anirban Mukhopadhyay;Ujjwal Maulik

  • Affiliations:
  • Department of Computer Science and Engineering, University of Kalyani, Kalyani 741235, India;Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India

  • Venue:
  • Pattern Recognition
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Recent advancement in microarray technology permits monitoring of the expression levels of a large set of genes across a number of time points simultaneously. For extracting knowledge from such huge volume of microarray gene expression data, computational analysis is required. Clustering is one of the important data mining tools for analyzing such microarray data to group similar genes into clusters. Researchers have proposed a number of clustering algorithms in this purpose. In this article, an attempt has been made in order to improve the performance of fuzzy clustering by combining it with support vector machine (SVM) classifier. A recently proposed real-coded variable string length genetic algorithm based clustering technique and an iterated version of fuzzy C-means clustering have been utilized in this purpose. The performance of the proposed clustering scheme has been compared with that of some well-known existing clustering algorithms and their SVM boosted versions for one simulated and six real life gene expression data sets. Statistical significance test based on analysis of variance (ANOVA) followed by posteriori Tukey-Kramer multiple comparison test has been conducted to establish the statistical significance of the superior performance of the proposed clustering scheme. Moreover biological significance of the clustering solutions have been established.