Minimum sum-of-squares clustering by DC programming and DCA

  • Authors:
  • Le Thi Hoai An;Pham Dinh Tao

  • Affiliations:
  • Laboratory of Theoretical and Applied Computer Science, UFR, MIM, University of Paul Verlaine - Metz, Metz, France;Laboratory of Modelling, Optimization & Operations Research, National Institute for Applied Sciences - Rouen, Mont Saint Aignan Cedex, France

  • Venue:
  • ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new approach based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) to perform clustering via minimum sum-of-squares Euclidean distance. The so called Minimum Sum-of-Squares Clustering (MSSC in short) is first formulated in the form of a hard combinatorial optimization problem. It is afterwards recast as a (continuous)| DC program with the help of exact penalty in DC programming. A DCA scheme is then investigated. The related DCA is original and very inexpensive because it amounts to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, that all are given in the explicit form. Numerical results on real word data sets show the efficiency of DCA and its great superiority with respect to K-means, a standard method of clustering.