Response Time Reliability in Cloud Environments: An Empirical Study of n-Tier Applications at High Resource Utilization

  • Authors:
  • Qingyang Wang;Yasuhiko Kanemasa;Jack Li;Deepal Jayasinghe;Motoyuki Kawaba;Calton Pu

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • SRDS '12 Proceedings of the 2012 IEEE 31st Symposium on Reliable Distributed Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

When running mission-critical web-facing applications (e.g., electronic commerce) in cloud environments, predictable response time, e.g., specified as service level agreements (SLA), is a major performance reliability requirement. Through extensive measurements of n-tier application benchmarks in a cloud environment, we study three factors that significantly impact the application response time predictability: bursty workloads (typical of web-facing applications), soft resource management strategies (e.g., global thread pool or local thread pool), and bursts in system software consumption of hardware resources (e.g., Java Virtual Machine garbage collection). Using a set of profit-based performance criteria derived from typical SLAs, we show that response time reliability is brittle, with large response time variations (order of several seconds) depending on each one of those factors. For example, for the same workload and hardware platform, different apparently reasonable soft resource management strategies may result in profit differences of 26\%. Similarly, modest increases in workload burstiness may result in profit drops of more than 50\%. Our study shows that performance reliability of large scale distributed applications is a significant and interesting research challenge. Furthermore, our results show that profit-based performance criteria may contribute significantly to the successful delimitation of performance unreliability boundaries and thus support effective management of clouds.