Reaching Efficient Fault-Tolerance for Cooperative Applications

  • Authors:
  • Peter Sobe

  • Affiliations:
  • -

  • Venue:
  • IPDS '00 Proceedings of the 4th International Computer Performance and Dependability Symposium
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cooperative applications are widely used, e.g. as parallel calculations or distributed information processing systems. Whereby such applications meet the users demand and offer a performance improvement, the susceptibility to faults of any used computer node is raised.Often a single fault may cause a complete application failure. On the other hand, we the redundancy in distributed systems can be utilized for fast fault detection and recovery. So, we followed an approach that is based on duplication of each application process to detect crashes and faulty functions of single computer nodes.We concentrate on two aspects of efficient fault-tolerance - fast fault detection and recovery without delaying the application progress significantly. The contribution of this work is first a new fault detecting protocol for duplicated processes. Secondly, we enhance a roll forward recovery scheme so that it is applicable to a set of cooperative processes in conformity to the communication protocol.