Unsupervised and Supervised Learning Approaches Together for Microarray Analysis

  • Authors:
  • Indrajit Saha;Ujjwal Maulik;Sanghamitra Bandyopadhyay;Dariusz Plewczynski

  • Affiliations:
  • (Correspd.) Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw, 02-106 Warsaw, Poland. indra@icm.edu.pl;Dep. of Computer Science and Engineering, Jadavpur University, Kolkata-700032,West Bengal, India. drumaulik@cse.jdvu.ac.in;Machine Intelligence Unit, Indian Statistical Institute, Kolkata-700108,West Bengal, India. sanghami@isical.ac.in;Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw, 02-106 Warsaw, Poland. darman@icm.edu.pl

  • Venue:
  • Fundamenta Informaticae
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, a novel concept is introduced by using both unsupervised and supervised learning. For unsupervised learning, the problem of fuzzy clustering in microarray data as a multiobjective optimization is used, which simultaneously optimizes two internal fuzzy cluster validity indices to yield a set of Pareto-optimal clustering solutions. In this regards, a new multiobjective differential evolution based fuzzy clustering technique has been proposed. Subsequently, for supervised learning, a fuzzy majority voting scheme along with support vector machine is used to integrate the clustering information from all the solutions in the resultant Pareto-optimal set. The performances of the proposed clustering techniques have been demonstrated on five publicly available benchmark microarray data sets. A detail comparison has been carried out with multiobjective genetic algorithm based fuzzy clustering, multiobjective differential evolution based fuzzy clustering, single objective versions of differential evolution and genetic algorithm based fuzzy clustering as well as well known fuzzy c-means algorithm. While using support vector machine, comparative studies of the use of four different kernel functions are also reported. Statistical significance test has been done to establish the statistical superiority of the proposed multiobjective clustering approach. Finally, biological significance test has been carried out using a web based gene annotation tool to show that the proposed integrated technique is able to produce biologically relevant clusters of coexpressed genes.