VolpexMPI: An MPI Library for Execution of Parallel Applications on Volatile Nodes

  • Authors:
  • Troy Leblanc;Rakhi Anand;Edgar Gabriel;Jaspal Subhlok

  • Affiliations:
  • Department of Computer Science, University of Houston,;Department of Computer Science, University of Houston,;Department of Computer Science, University of Houston,;Department of Computer Science, University of Houston,

  • Venue:
  • Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The objective of this research is to convert ordinary idle PCs into virtual clusters for executing parallel applications. The paper introduces VolpexMPI that is designed to enable seamless forward application progress in the presence of frequent node failures as well as dynamically changing networks speeds and node execution speeds. Process replication is employed to provide robustness in such volatile environments. The central challenge in VolpexMPI design is to efficiently and automatically manage dynamically varying number of process replicas in different states of execution progress. The key fault tolerance technique employed is fully distributed sender based logging. The paper presents the design and a prototype implementation of VolpexMPI. Preliminary results validate that the overhead of providing robustness is modest for applications having a favorable ratio of communication to computation and a low degree of communication.