Transparent redundant computing with MPI

  • Authors:
  • Ron Brightwell;Kurt Ferreira;Rolf Riesen

  • Affiliations:
  • Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM

  • Venue:
  • EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extreme-scale parallel systems will require alternative methods for applications to maintain current levels of uninterrupted execution. Redundant computation is one approach to consider, if the benefits of increased resiliency outweigh the cost of consuming additional resources. We describe a transparent redundancy approach for MPI applications and detail two different implementations that provide the ability to tolerate a range of failure scenarios, including loss of application processes and connectivity. We compare these two approaches and show performance results from micro-benchmarks that bound worstcase message passing performance degradation. We propose several enhancements that could lower the overhead of providing resiliency through redundancy.