Dodging the cost of unavoidable memory copies in message logging protocols

  • Authors:
  • George Bosilca;Aurelien Bouteiller;Thomas Herault;Pierre Lemarinier;Jack J. Dongarra

  • Affiliations:
  • University of Tennessee, TN;University of Tennessee, TN;University of Tennessee, TN and Universite Paris-Sud, INRIA, France;University of Tennessee, TN;University of Tennessee, TN and Oak Ridge National Laboratory, TN

  • Venue:
  • EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault tolerant; most are in need for a seamless recovery framework. Among the automatic fault tolerant techniques proposed for MPI, message logging is preferable for its scalable recovery. The major challenge for message logging protocols is the performance penalty on communications during failure-free periods, mostly coming from the payload copy introduced for each message. In this paper, we investigate different approaches for logging payload and compare their impact on network performance.