FT-MPI, fault-tolerant metacomputing and generic name services: a case study

  • Authors:
  • David Dewolfs;Jan Broeckhove;Vaidy Sunderam;Graham E. Fagg

  • Affiliations:
  • Depts. of Math and Computer Science of the University of Antwerp, Emory University, the University of Tennessee, Antwerp, Atlanta, GA, Knoxville, TN, Belgium;Depts. of Math and Computer Science of the University of Antwerp, Emory University, the University of Tennessee, Antwerp, Atlanta, GA, Knoxville, TN, Belgium;Depts. of Math and Computer Science of the University of Antwerp, Emory University, the University of Tennessee, Antwerp, Atlanta, GA, Knoxville, TN, Belgium;Depts. of Math and Computer Science of the University of Antwerp, Emory University, the University of Tennessee, Antwerp, Atlanta, GA, Knoxville, TN, Belgium

  • Venue:
  • EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

There is a growing interest in deploying MPI over very large numbers of heterogenous, geographically distributed resources. FT-MPI provides the fault-tolerance necessary at this scale, but presents some issues when crossing multiple administrative domains. Using the H2O metacomputing framework, we add cross-administrative domain interoperability and “pluggability” to FT-MPI. The latter feature allows us, using proxies, to transparently replace one vulnerable module – its name service – with fault-tolerant replacements. We present an algorithm for improving performance of operations over the proxies. We evaluate its performance in a comparison using the original name service, OpenLDAP and current Emory research project HDNS.