GAKREM: A novel hybrid clustering algorithm

  • Authors:
  • Cao D. Nguyen;Krzysztof J. Cios

  • Affiliations:
  • Virginia Commonwealth University, Computer Science, 401 W. Main Street, Richmond, VA 23238, USA;Virginia Commonwealth University, Computer Science, 401 W. Main Street, Richmond, VA 23238, USA and IITiS Polish Academy of Sciences, Poland

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 0.07

Visualization

Abstract

We introduce a novel clustering algorithm named GAKREM (Genetic Algorithm K-means Logarithmic Regression Expectation Maximization) that combines the best characteristics of the K-means and EM algorithms but avoids their weaknesses such as the need to specify a priori the number of clusters, termination in local optima, and lengthy computations. To achieve these goals, genetic algorithms for estimating parameters and initializing starting points for the EM are used first. Second, the log-likelihood of each configuration of parameters and the number of clusters resulting from the EM is used as the fitness value for each chromosome in the population. The novelty of GAKREM is that in each evolving generation it efficiently approximates the log-likelihood for each chromosome using logarithmic regression instead of running the conventional EM algorithm until its convergence. Another novelty is the use of K-means to initially assign data points to clusters. The algorithm is evaluated by comparing its performance with the conventional EM algorithm, the K-means algorithm, and the likelihood cross-validation technique on several datasets.