From alternative clustering to robust clustering and its application to gene expression data

  • Authors:
  • Peter Peng;Mohamad Nagi;Omer Şair;Iyad Suleiman;Ala Qabaja;Abdallah M. ElSheikh;Shang Gao;Tansel Özyer;Keivan Kianmehr;Ghada Naji;Mick Ridley;Jon Rokne;Reda Alhajj

  • Affiliations:
  • Computer Science Dept, University of Calgary, Calgary, Alberta, Canada;School of Computing, University of Bradford, Bradford, UK;Dept of Computer Engineering, TOBB University, Ankara, Turkey;School of Computing, University of Bradford, Bradford, UK;Computer Science Dept, University of Calgary, Calgary, Alberta, Canada;Computer Science Dept, University of Calgary, Calgary, Alberta, Canada;Computer Science Dept, University of Calgary, Calgary, Alberta, Canada;Dept of Computer Engineering, TOBB University, Ankara, Turkey;Dept of Elect. & Comp. Eng., University of Western Ontario, London, ON, Canada;Department of Biology, Lebanese University, Tripoli, Lebanon;School of Computing, University of Bradford, Bradford, UK;Computer Science Dept, University of Calgary, Calgary, Alberta, Canada;Computer Science Dept, University of Calgary, Calgary, Alberta, Canada and Department of Computer Science, Global University, Beirut, Lebanon

  • Venue:
  • IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The major contribution of the work described in this paper could be articulated as a parameter free clustering approach that leads to appropriate distribution of the given data instances into the most convenient clusters. This goal is realized in several steps. First, we apply multi-objective genetic algorithm to determine some alternative clustering solutions that constitute the pareto-front. The result is a pool of the clusters reported by all the solutions. Then, we determine the homogeneity of each cluster in the pool to keep the most homogeneous clusters which may not be select from one solution because a solution which is favored the most by considering the multiple objectives might have some clusters which are less homogeneous compared to best clusters in other solutions. Finally, as a given data instance may belong to more than one cluster in the solution set we reduce this membership to the cluster in which the instance is closest to the centroid. Many applications like gene expression data analysis are in need for such parameter free approach because the correctness of the post processing is directly affected by the outcome form the clustering process. We demonstrate the applicability and effectiveness of the proposed clustering approach by conducting experiments using two benchmark data sets.