A planning based approach to failure recovery in distributed systems

  • Authors:
  • Naveed Arshad;Dennis Heimbigner;Alexander L. Wolf

  • Affiliations:
  • University of Colorado, Boulder, CO;University of Colorado, Boulder, CO;University of Colorado, Boulder, CO

  • Venue:
  • WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Failure recovery in distributed systems poses a difficult challenge because of the requirement for high availability. Failure scenarios are usually unpredictable so they can not easily be foreseen. In this research we propose a planning based approach to failure recovery. This approach automates failure recovery by capturing the state after failure, defining an acceptable recovered state as a goal and applying planning to get from the initial state to the goal state. By using planning, this approach can recover from a variety of failed states and reach any of several acceptable states: from minimal functionality to complete recovery.