Improvement of new automatic differential fuzzy clustering using SVM classifier for microarray analysis

  • Authors:
  • Indrajit Saha;Ujjwal Maulik;Sanghamitra Bandyopadhyay;Dariusz Plewczynski

  • Affiliations:
  • Interdisciplinary Centre for Mathematical and Computational Modeling (ICM), University of Warsaw, 02-106 Warsaw, Poland;Department of Computer Science and Engineering, Jadavpur University, Kolkata 700 032, West Bengal, India;The Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700 108, West Bengal, India;Interdisciplinary Centre for Mathematical and Computational Modeling (ICM), University of Warsaw, 02-106 Warsaw, Poland

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

In recent year, the problem of clustering in microarray data has been gaining significant attention. However most of the clustering methods attempt to find the group of genes where the number of cluster is known a priori. This fact motivated us to develop a new real-coded improved differential evolution based automatic fuzzy clustering algorithm which automatically evolves the number of clusters as well as the proper partitioning of a gene expression data set. To improve the result further, the clustering method is integrated with a support vector machine, a well-known technique for supervised learning. A fraction of the gene expression data points selected from different clusters based on their proximity to the respective centers, is used for training the SVM. The clustering assignments of the remaining gene expression data points are thereafter determined using the trained classifier. The performance of the proposed clustering technique has been demonstrated on five gene expression data sets by comparing it with the differential evolution based automatic fuzzy clustering, variable length genetic algorithm based fuzzy clustering and well known Fuzzy C-Means algorithm. Statistical significance test has been carried out to establish the statistical superiority of the proposed clustering approach. Biological significance test has also been carried out using a web based gene annotation tool to show that the proposed method is able to produce biologically relevant clusters of genes. The processed data sets and the matlab version of the software are available at http://bio.icm.edu.pl/~darman/IDEAFC-SVM/.