ScELA: scalable and extensible launching architecture for clusters

  • Authors:
  • Jaidev K. Sridhar;Matthew J. Koop;Jonathan L. Perkins;Dhabaleswar K. Panda

  • Affiliations:
  • Network-Based Computing Laboratory, The Ohio State University, Columbus, OH;Network-Based Computing Laboratory, The Ohio State University, Columbus, OH;Network-Based Computing Laboratory, The Ohio State University, Columbus, OH;Network-Based Computing Laboratory, The Ohio State University, Columbus, OH

  • Venue:
  • HiPC'08 Proceedings of the 15th international conference on High performance computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As cluster sizes head into tens of thousands, current joblaunchmechanisms do not scale as they are limited by resource constraintsas well as performance bottlenecks. The job launch process includes twophases - spawning of processes on processors and information exchange betweenprocesses for job initialization. Implementations of various programmingmodels follow distinct protocols for the information exchange phase.We present the design of a scalable, extensible and high-performance joblaunch architecture for very large scale parallel computing. We present implementationsof this architecture which achieve a speedup of more than700% in launching a simple Hello World MPI application on 10, 240 processorcores and also scale to more than 3 times the number of processorcores compared to prior solutions.