Partial-order planning: evaluating possible efficiency gains
Artificial Intelligence
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Automated Planning: Theory & Practice
Automated Planning: Theory & Practice
Automatic recovery from software failure
Communications of the ACM - Self managed systems
A planning-based approach to failure recovery in distributed systems
A planning-based approach to failure recovery in distributed systems
Automated planners for storage provisioning and disaster recovery
IBM Journal of Research and Development
Automatic undo for cloud management via AI planning
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
Supporting undoability in systems operations
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Hi-index | 0.00 |
Multi-layered provisioning architectures such as those in emergent virtualized (e.g. cloud) infrastructures exacerbate the cost of faults to a degree where automation effectively constitutes a prerequisite for operations. The acquisition of management information and the execution of routine tasks have been automated to some degree; however the decision processes behind fault management in large-scale environments have not. This paper addresses automation of such decision processes by proposing a planning-based fault recovery algorithm based on hierarchical task networks and data models for the knowledge necessary to the recovery process. We embed these concepts in a generic architecture and evaluate its prototypical implementation with respect to function and scalability.