A framework for automated fault recovery planning in large-scale virtualized infrastructures

  • Authors:
  • Feng Liu;Vitalian A. Danciu;Pavlo Kerestey

  • Affiliations:
  • Munich Network Management Team, Ludwig-Maximilians-Universität, München;Munich Network Management Team, Ludwig-Maximilians-Universität, München;Technische Universität München

  • Venue:
  • MACE'10 Proceedings of the 5th IEEE international conference on Modelling autonomic communication environments
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-layered provisioning architectures such as those in emergent virtualized (e.g. cloud) infrastructures exacerbate the cost of faults to a degree where automation effectively constitutes a prerequisite for operations. The acquisition of management information and the execution of routine tasks have been automated to some degree; however the decision processes behind fault management in large-scale environments have not. This paper addresses automation of such decision processes by proposing a planning-based fault recovery algorithm based on hierarchical task networks and data models for the knowledge necessary to the recovery process. We embed these concepts in a generic architecture and evaluate its prototypical implementation with respect to function and scalability.