Adaptive Application Scaling for Improving Fault-Tolerance and Availability in the Cloud

Authors:
Ganesan Radhakrishnan
Affiliations:
Alcatel-Lucent's Software, Services & Solutions unit, Columbus, Ohio
Venue:
Bell Labs Technical Journal
Year:
2012

Citing 2
Cited 0

A generalized processor sharing approach to flow control in integrated services networks: the single-node case

IEEE/ACM Transactions on Networking (TON)
Instability of FIFO in session-oriented networks

Journal of Algorithms - Special issue: SODA 2000

Quantified Score

Hi-index	0.00

Visualization

Abstract

In cloud environments, faults and run-time anomalies in the infrastructure can exhaust resources, and impact the performance of all applications that share them. A resource monitoring strategy alone is inadequate since snapshots of resource usage cannot provide any guarantee of application performance. This paper outlines an approach that enables an application to leverage the vast capacity and elasticity of the cloud to mitigate the deleterious effects of resource exhaustion at a node. It models the application as a network of servers and the flow dynamics of request streams as continuous functions of time, using queuing techniques. The strategy is to compute, for each server, the mean flow rate and the mean holding time and use this to decide among: a) redirecting the flow to another server, b) requesting additional resources from the cloud infrastructure, c) spawning additional server instances, or d) combining server instances to conserve resources. This dynamic re-configurability by scaling improves application fault-tolerance, availability, and resource utilization. © 2012 Alcatel-Lucent. © 2012 Wiley Periodicals, Inc.