Communication target selection for replicated MPI processes

  • Authors:
  • Rakhi Anand;Edgar Gabriel;Jaspal Subhlok

  • Affiliations:
  • Department of Computer Science, University of Houston;Department of Computer Science, University of Houston;Department of Computer Science, University of Houston

  • Venue:
  • EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

VolpexMPI is an MPI library designed for volunteer computing environments. In order to cope with the fundamental unreliability of these environments, VolpexMPI deploys two or more replicas of each MPI process. A receiver-driven communication scheme is employed to eliminate redundant message exchanges and sender based logging is employed to ensure seamless application progress with varying processor execution speeds and routine failures. In this model, to execute a receive operation, a decision has to be made as to which of the sending process replicas should be contacted first. Contacting the fastest replica appears to be the optimal local decision, but it can be globally non-optimal as it may slowdown the fastest replica. Further, identifying the fastest replica during execution is a challenge in itself. This paper evaluates various target selection algorithms to manage these trade-offs with the objective of minimizing the overall execution time. The algorithms are evaluated for the NAS Parallel Benchmarks utilizing heterogeneous network configurations, heterogeneous processor configurations and a combination of both.