Application-Level Fault Tolerance as a Complement to System-Level Fault Tolerance

  • Authors:
  • Joshua Haines;Vijay Lakamraju;Israel Koren;C. Mani Krishna

  • Affiliations:
  • Electrical and Computer Engineering Dept., University of Massachusetts, Amherst, MA 01003 jhaines@ecs.umass.edu;Electrical and Computer Engineering Dept., University of Massachusetts, Amherst, MA 01003 vlakamra@ecs.umass.edu;Electrical and Computer Engineering Dept., University of Massachusetts, Amherst, MA 01003 koren@ecs.umass.edu;Electrical and Computer Engineering Dept., University of Massachusetts, Amherst, MA 01003 krishna@ecs.umass.edu

  • Venue:
  • The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

As multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data parallelism. System-level fault tolerance involves reliability techniques incorporated within the system hardware and software whereas application-level fault tolerance involves reliability techniques incorporated within the application software. We assert that, for high reliability, a combination of system-level fault tolerance and application-level fault tolerance works best. In many systems, application-level fault tolerance can be used to bridge the gap when system-level fault tolerance alone does not provide the required reliability. We exemplify this with the RTHT target tracking benchmark and the ABF beamforming benchmark.