Multi-objective genetic algorithm based clustering approach and its application to gene expression data

  • Authors:
  • Tansel Özyer;Yimin Liu;Reda Alhajj;Ken Barker

  • Affiliations:
  • Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada

  • Venue:
  • ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gene clustering is a common methodology for analyzing similar data based on expression trajectories. Clustering algorithms in general need the number of clusters as a priori, and this is mostly hard to estimate, even by domain experts. In this paper, we use Niched Pareto k-means Genetic Algorithm (GA) for clustering m-RNA data. After running the multi-objective GA, we get the pareto-optimal front that gives alternatives for the optimal number of clusters as a solution set. We analyze the clustering results under two cluster validity techniques commonly cited in the literature, namely DB index and SD index. This gives an idea about ranking the optimal numbers of clusters for each validity index. We tested the proposed clustering approach by conducting experiments using three data sets, namely figure2data, cancer (NCI60) and Leukaemia data. The obtained results are promising; they demonstrate the applicability and effectiveness of the proposed approach.