Genetically Improved PSO Algorithm for Efficient Data Clustering

  • Authors:
  • Rehab F. Abdel-Kader

  • Affiliations:
  • -

  • Venue:
  • ICMLC '10 Proceedings of the 2010 Second International Conference on Machine Learning and Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is an important research topic in data mining that appears in a wide range of unsupervised classification applications. Partitional clustering algorithms such as the k-means algorithm are the most popular for clustering large datasets. The major problem with the k-means algorithm is that it is sensitive to the selection of the initial partitions and it may converge to local optima. In this paper, we present a hybrid two-phase GAI-PSO+k-means data clustering algorithm that performs fast data clustering and can avoid premature convergence to local optima. In the first phase we utilize the new genetically improved particle swarm optimization algorithm (GAI-PSO) which is a population-based heuristic search technique modeled on the hybrid of cultural and social rules derived from the analysis of the swarm intelligence (PSO) and the concepts of natural selection and evolution (GA). The GAI-PSO combines the standard velocity and position update rules of PSOs with the ideas of selection, mutation and crossover from GAs. The GAI-PSO algorithm searches the solution space to find the optimal initial cluster centroids for the next phase. The second phase is a local refining stage utilizing the k-means algorithm which can efficiently converge to the optimal solution. The proposed algorithm combines the ability of the globalized searching of the evolutionary algorithms and the fast convergence of the k-means algorithm and can avoid the drawback of both. The performance of the proposed algorithm is evaluated through several benchmark datasets. The experimental results show that the proposed algorithm is highly forceful and outperforms the previous approaches such as SA, ACO, PSO and k-means for the partitional clustering problem.