SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
EDOC '03 Proceedings of the 7th International Conference on Enterprise Distributed Object Computing
Deployment and Dynamic Reconfiguration Planning for Distributed Software Systems
ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
A planning based approach to failure recovery in distributed systems
WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
Automatic Model-Driven Recovery in Distributed Systems
SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
Rewind, repair, replay: three R's to dependability
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
The FF planning system: fast plan generation through heuristic search
Journal of Artificial Intelligence Research
Planning-based configuration and management of distributed systems
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
A Self-repair Architecture for Cluster Systems
Architecting Dependable Systems VI
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
A framework for automated fault recovery planning in large-scale virtualized infrastructures
MACE'10 Proceedings of the 5th IEEE international conference on Modelling autonomic communication environments
A survey of B-tree logging and recovery techniques
ACM Transactions on Database Systems (TODS)
Automated planning for configuration changes
LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
Automatic undo for cloud management via AI planning
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
SAP speaks PDDL: exploiting a software-engineering model for planning in business process management
Journal of Artificial Intelligence Research
System structure for software fault tolerance
IEEE Transactions on Software Engineering
Hi-index | 0.00 |
When managing cloud resources, many administrators operate without a safety net. For instance, inadvertently deleting a virtual disk results in the complete loss of the contained data. The facility to undo a collection of changes, reverting to a previous acceptable state, is widely recognized as valuable support for dependability. In this paper, we consider the particular needs of the system administrators managing API-controlled resources, such as cloud resources on the IaaS level. In particular, we propose an approach which is based on an abstract model of the effects of each available operation. Using this model, we check to which degree each operation is undoable. A positive outcome of this check means a formal guarantee that any sequence of calls to such operations can be undone. A negative outcome contains information on the properties preventing undoability, e.g., which operations are not undoable and why. At runtime we can then warn the user intending to use an irreversible operation; if undo is possible and desired, we apply an AI planning technique to automatically create a workflow that takes the system back to the desired earlier state. We demonstrate the feasibility and applicability of the approach with a prototypical implementation and a number of experiments.