Cloud API issues: an empirical study and impact

Authors:
Qinghua Lu;Liming Zhu;Len Bass;Xiwei Xu;Zhanwen Li;Hiroshi Wada
Affiliations:
NICTA, Sydney, Australia;NICTA, Sydney, Australia;NICTA, Sydney, Australia;NICTA, Sydney, Australia;NICTA, Sydney, Australia;NICTA, Sydney, Australia
Venue:
Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
Year:
2013

Citing 9
Cited 1

Basic Concepts and Taxonomy of Dependable and Secure Computing

IEEE Transactions on Dependable and Secure Computing
Characterizing cloud computing hardware reliability

Proceedings of the 1st ACM symposium on Cloud computing
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Understanding network failures in data centers: measurement, analysis, and implications

Proceedings of the ACM SIGCOMM 2011 conference
An empirical study on configuration errors in commercial and open source systems

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
An Extensible Framework for Improving a Distributed Software System's Deployment Architecture

IEEE Transactions on Software Engineering
Workflow resource patterns: identification, representation and tool support

CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Workflow exception patterns

CAiSE'06 Proceedings of the 18th international conference on Advanced Information Systems Engineering
The tail at scale

Communications of the ACM

Process-oriented recovery for operations on cloud applications

Proceedings of the 4th annual Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outages to the cloud infrastructures have been widely publicized and it would be easy to conclude that application developers only need to be concerned with large scale cloud provider infrastructure outages. Unfortunately, this is not the case. In-cloud applications heavily rely on cloud infrastructure APIs (directly or indirectly through scripts and consoles) for many sporadic activities such as deployment change, scaling out/in, backup, recovery and migration. Failures and/or issues around API calls are a large source of faults that could lead to application failures, especially during sporadic activities. Infrastructure outages can also be greatly exacerbated by API-related issues. In this paper we present an empirical study of issues in Amazon EC2 APIs. Some of the major findings around API issues include: 1) A majority (60%) of the cases of API failures are related to "stuck" API calls or unresponsive API calls. 2) A significant portion (12%) of the cases of API failures are about slow responsive API calls. 3) 19% of the cases of API failures are related to the output issues of API calls, including failed calls with unclear error messages, as well as missing output, wrong output, and unexpected output of API calls. 4) There are 9% cases of API failures reporting that their calls (performing some actions and expecting a state change) were pending for a certain time and then returned to the original state without informing the caller properly or the calls were reported to be successful first but failed later. We also classify the causes of API issues and discuss the impact of API issues on application architectures.