Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI

  • Authors:
  • Stéphane Genaud;Pierre Gançarski;Guillaume Latu;Alexandre Blansché;Choopan Rattanapoka;Damien Vouriot

  • Affiliations:
  • LSIIT-ICPS, Louis Pasteur University, Strasbourg --- UMR 7005 CNRS-ULP, Illkirch, France 67412;LSIIT-AFD, Louis Pasteur University, Strasbourg --- UMR 7005 CNRS-ULP, Illkirch, France 67412;LSIIT-ICPS, Louis Pasteur University, Strasbourg --- UMR 7005 CNRS-ULP, Illkirch, France 67412;LSIIT-AFD, Louis Pasteur University, Strasbourg --- UMR 7005 CNRS-ULP, Illkirch, France 67412;LSIIT-ICPS, Louis Pasteur University, Strasbourg --- UMR 7005 CNRS-ULP, Illkirch, France 67412;LSIIT-AFD, Louis Pasteur University, Strasbourg --- UMR 7005 CNRS-ULP, Illkirch, France 67412

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of clustering is to identify subsets called clusters which usually correspond to objects that are more similar to each other than they are to objects from other clusters. We have proposed the MACLAW method, a cooperative coevolution algorithm for data clustering, which has shown good results (Blansché and Gançarski, Pattern Recognit. Lett. 27(11), 1299---1306, 2006). However the complexity of the algorithm increases rapidly with the number of clusters to find. We propose in this article a parallelization of MACLAW, based on a message-passing paradigm, as well as the analysis of the application performances with experiment results. We show that we reach near optimal speedups when searching for 16 clusters, a typical problem instance for which the sequential execution duration is an obstacle to the MACLAW method. Further, our approach is original because we use the P2P-MP1 grid middleware (Genaud and Rattanapoka, Lecture Notes in Comput. Sci., vol. 3666, pp. 276---284, 2005) which both provides the message passing library and infrastructure services to discover computing resources. We also put forward that the application can be tightly coupled with the middleware to make the parallel execution nearly transparent for the user.