Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data

Authors:
Peter Peng;Omer Addam;Mohamad Elzohbi;Sibel T. Özyer;Ahmad Elhajj;Shang Gao;Yimin Liu;Tansel Özyer;Mehmet Kaya;Mick Ridley;Jon Rokne;Reda Alhajj
Affiliations:
Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Engineering, Cankaya University, Ankara, Turkey;Department of Computing, University of Bradford, Bradford, UK;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Engineering, TOBB University, Ankara, Turkey;Department of Computer Engineering, Firat University 23119, Elazig, Turkey;Department of Computing, University of Bradford, Bradford, UK;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada and Department of Computer Science, Global University, Beirut, Lebanon
Venue:
Knowledge-Based Systems
Year:
2014

Citing 27
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Experimental results of randomized clustering algorithm

Proceedings of the twelfth annual symposium on Computational geometry
Self-organizing maps

Self-organizing maps
Context-specific Bayesian clustering for gene expression data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Clustering Algorithms

Clustering Algorithms
Probabilistic hierarchical clustering for biological data

Proceedings of the sixth annual international conference on Computational biology
Clustering validity checking methods: part II

ACM SIGMOD Record
Model-based clustering in gene expression microarrays: an application to breast cancer data

APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
Validating and Refining Clusters via Visual Rendering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
FGKA: a Fast Genetic K-means Clustering Algorithm

Proceedings of the 2004 ACM symposium on Applied computing
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation
Multiobjective clustering with automatic k-determination for large-scale data

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Evolutionary spectral clustering by incorporating temporal smoothness

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An improved algorithm for clustering gene expression data

Bioinformatics
Graph partitioning through a multi-objective evolutionary algorithm: a preliminary study

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining

Journal of Intelligent Information Systems
Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGA-II

IEEE Transactions on Evolutionary Computation
A survey of evolutionary algorithms for clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A symmetry based multiobjective clustering technique for automatic evolution of clusters

Pattern Recognition
Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer

Applied Intelligence
Numerical methods for fuzzy clustering

Information Sciences: an International Journal
A Multiobjective and Evolutionary Clustering Method for Dynamic Networks

ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Integrating multi-objective genetic algorithm based clustering and data partitioning for skyline computation

Applied Intelligence
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-objective genetic algorithm based clustering approach and its application to gene expression data

ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Multiobjective Genetic Algorithms for Clustering: Applications in Data Mining and Bioinformatics

Multiobjective Genetic Algorithms for Clustering: Applications in Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is an essential research problem which has received considerable attention in the research community for decades. It is a challenge because there is no unique solution that fits all problems and satisfies all applications. We target to get the most appropriate clustering solution for a given application domain. In other words, clustering algorithms in general need prior specification of the number of clusters, and this is hard even for domain experts to estimate especially in a dynamic environment where the data changes and/or become available incrementally. In this paper, we described and analyze the effectiveness of a robust clustering algorithm which integrates multi-objective genetic algorithm into a framework capable of producing alternative clustering solutions; it is called Multi-objective K-Means Genetic Algorithm (MOKGA). We investigate its application for clustering a variety of datasets, including microarray gene expression data. The reported results are promising. Though we concentrate on gene expression and mostly cancer data, the proposed approach is general enough and works equally to cluster other datasets as demonstrated by the two datasets Iris and Ruspini. After running MOKGA, a pareto-optimal front is obtained, and gives the optimal number of clusters as a solution set. The achieved clustering results are then analyzed and validated under several cluster validity techniques proposed in the literature. As a result, the optimal clusters are ranked for each validity index. We apply majority voting to decide on the most appropriate set of validity indexes applicable to every tested dataset. The proposed clustering approach is tested by conducting experiments using seven well cited benchmark data sets. The obtained results are compared with those reported in the literature to demonstrate the applicability and effectiveness of the proposed approach.