Fault-Tolerant Communication in Embedded Supercomputing

  • Authors:
  • Giorgos Efthivoulidis;Evangelos A. Verentziotis;Apostolos N. Meliones;Theodora A. Varvarigou;Antonios Kontizas;Geert Deconinck;Vincenzo De Florio

  • Affiliations:
  • -;-;-;-;-;-;-

  • Venue:
  • IEEE Micro
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

A framework is developed to integrate fault tolerance flexibly and easily into embedded parallel HPC applications. This framework consists of a variety of reusable fault tolerance modules acting at different levels and coping with common requirements. The burden of ad hoc fault tolerance programming is removed from the application developers, while at the same time mediocre fault tolerance support taken at the operating system level is avoided. Integration of this functionality in real embedded applications validates this approach, and provides promising results. In this article we focus on fault tolerance mechanisms for synchronous and asynchronous communication between application threads running on system nodes.