Fault tolerant supercomputing: a software approach

  • Authors:
  • E. Verentziotis;T. Varvarigou;D. Vergados;G. Deconinck

  • Affiliations:
  • Dept. of Elect. & Comp. Eng., National Technical University of Athens, Iroon Politechniou 9, 15733 Zographou, GREECE;Dept. of Elect. & Comp. Eng., National Technical University of Athens, Iroon Politechniou 9, 15733 Zographou, GREECE;Dept. of Elect. & Comp. Eng., National Technical University of Athens, Iroon Politechniou 9, 15733 Zographou, GREECE;Dept. Elektrotechniek (ESAT), K.U.Leuven, Kard. Mercierlaan 94, 3001 Heverlee, BELGIUM

  • Venue:
  • Information processing and technology
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Adding fault tolerance to embedded supercomputing applications is becoming an issue of great significance, especially as these applications support critical parts of our everyday life in the modern "Information Society". To this end, a software middleware framework is presented that features a collection of flexible and reusable fault tolerance modules acting at different levels and coping with common fault tolerance requirements. The burden of ad hoc fault tolerance programming is removed from the application developer, while at the same time average fault tolerance support taken at operating system level is avoided. A high-level description helps the developer specify the fault tolerance strategies of the application as a sort of second application layer; this separates functional from fault tolerance aspects of an application, shortening the development cycle and improving maintainability. Integration of this functionality in real embedded applications validates this approach.