Dealing with failures during failure recovery of distributed systems

  • Authors:
  • Naveed Arshad;Dennis Heimbigner;Alexander L. Wolf

  • Affiliations:
  • University of Colorado, Boulder, CO;University of Colorado, Boulder, CO;University of Colorado, Boulder, CO

  • Venue:
  • DEAS '05 Proceedings of the 2005 workshop on Design and evolution of autonomic application software
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the characteristics of autonomic systems is self recovery from failures. Self recovery can be achieved through sensing failures, planning for recovery and executing the recovery plan to bring the system back to a normal state. For various reasons, however, additional failures are possible during the process of recovering from the initial failure. Handling such secondary failures is important because they can cause the original recovery plan to fail and can leave the system in a complicated state that is worse than before. In this paper techniques are identified to preserve consistency while dealing with such failures that occur during failure recovery.