When average is not average: large response time fluctuations in n-tier systems

Authors:
Qingyang Wang;Yasuhiko Kanemasa;Motoyuki Kawaba;Calton Pu
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;FUJITSU LABORATORIES LTD., Kanagawa, Japan;FUJITSU LABORATORIES LTD., Kanagawa, Japan;Georgia Institute of Technology, Atlanta, GA, USA
Venue:
Proceedings of the 9th international conference on Autonomic computing
Year:
2012

Citing 14
Cited 1

Size-based scheduling to improve web performance

ACM Transactions on Computer Systems (TOCS)
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Open versus closed: a cautionary tale

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Ironmodel: robust performance models in the wild
Translating Service Level Objectives to lower level policies for multi-tier services

Cluster Computing
What does control theory bring to systems research?

ACM SIGOPS Operating Systems Review
Automated control of multiple virtualized resources

Proceedings of the 4th ACM European conference on Computer systems
Injecting realistic burstiness to a traditional client-server benchmark

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Characterizing, modeling, and generating workload spikes for stateful services

Proceedings of the 1st ACM symposium on Cloud computing
Automated control for elastic storage

Proceedings of the 7th international conference on Autonomic computing
Economical and Robust Provisioning of N-Tier Cloud Workloads: A Multi-level Control Approach

ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Cloud Management: Challenges and Opportunities

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
The Impact of Soft Resource Allocation on n-Tier Application Scalability

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Region scheduling: efficiently using the cache architectures via page-level affinity

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Agile middleware for scheduling: meeting competing performance requirements of diverse tasks

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Simultaneously achieving good performance and high resource utilization is an important goal for production cloud environments. Through extensive measurements of an n-tier application benchmark (RUBBoS), we show that system response time frequently presents large scale fluctuations (e.g., ranging from tens of milliseconds up to tens of seconds) during periods of high resource utilization. Except the factor of bursty workload from clients, we found that the large scale response time fluctuations can be caused by some system environmental conditions (e.g., L2 cache miss, JVM garbage collection, inefficient scheduling policies) that commonly exist in n-tier applications. The impact of these system environmental conditions can largely amplify the end-to-end response time fluctuations because of the complex resource dependencies in the system. For instance, a 50ms response time increase in the database tier can be amplified to 500ms end-to-end response time increase. We evaluate three heuristics to stabilize response time fluctuations while still achieving high resource utilization in the system. Our results show that large scale response time fluctuations should be taken into account when designing effective autonomous self-scaling n-tier systems in cloud.