Synergistic Coordination between Software and Hardware Fault Tolerance Techniques

  • Authors:
  • Ann T. Tai;Kam S. Tso;Leon Alkalai;Savio N. Chau;William H. Sanders

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: This paper describes an approach for enabling the synergistic coordination between two fault tolerance protocols to simultaneously tolerate software and hardware faults in a distributed computing environment. Specifically, our approach is based on a message-driven confidence-driven (MDCD) protocol that we have devised for tolerating software design faults, and a time-based (TB) checkpointing protocol that was developed by Neves and Fuchs for tolerating hardware faults. By carrying out algorithm modifications that are conducive to synergistic coordination between volatile-storage and stable-storage checkpoint establishments, we are able to circumvent the potential interference between the MDCD and TB protocols, and to allow them to effectively complement each other to extend a system's fault tolerance capability. Moreover, the protocol-coordination approach preserves and enhances the features and advantages of the individual protocols that participate in the coordination, keeping the performance cost low.