FATE and DESTINI: a framework for cloud recovery testing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Cloud API issues: an empirical study and impact
Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
Detecting cloud provisioning errors using an annotated process model
Proceedings of the 8th Workshop on Middleware for Next Generation Internet Computing
Hi-index | 0.00 |
A large number of cloud application failures happen during sporadic operations on cloud applications, such as upgrade, deployment reconfiguration, migration and scaling-out/in. Most of them are caused by operator and process errors [1]. From a cloud consumer's perspective, recovery from these failures relies on the limited control and visibility provided by the cloud providers. In addition, a large-scale system often has multiple operation processes happening simultaneously, which exacerbates the problem during error diagnosis and recovery. Existing built-in or infrastructure-based recovery mechanisms often assume random component failures and use checkpoint-based rollback, compensation actions [2], redundancy and rejuvenation to handle recovery [3]. These recovery mechanisms do not consider the characteristics of a specific operation process that consists of a set of steps carried out by scripts and humans interacting with fragile cloud infrastructure APIs and uncertain resources [4]. Other approaches such as FATE/DESTINI [5] look at the process implied by a system's internal protocols and rely on the built-in recovery protocol to detect and recover from bugs. The problem we target is at a different level related to the external sporadic activities operating on a hosted cloud application.