Low cost self-healing in MPI applications

  • Authors:
  • Jacques A. Da Silva;Vinod E. F. Rebello

  • Affiliations:
  • Instituto de Computação, Universidade Federal Fluminense, Brazil;Instituto de Computação, Universidade Federal Fluminense, Brazil

  • Venue:
  • PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Writing applications capable of executing efficiently in Grids is extremely difficult and tedious for inexperienced users. The distributed resources are typically heterogeneous, non-dedicated, and are offered without any performance or availability guarantees. Systems capable of adapting the execution of an application to the dynamic characteristics of the Grid are essential. This work describes the strategy used to bestow the self-healing property on autonomic EasyGrid MPI applications to withstand process and resource failures. This paper highlights both the difficulties and the low cost solution adopted to offer fault tolerance in applications based on the standard Grid installation of LAM/MPI.