Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Authors:
Sriram Govindan;Jie Liu;Aman Kansal;Anand Sivasubramaniam
Affiliations:
The Pennsylvania State University;Microsoft Research Redmond;Microsoft Research Redmond;The Pennsylvania State University
Venue:
Proceedings of the 2nd ACM Symposium on Cloud Computing
Year:
2011

Citing 34
Cited 18

Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The design and performance of a conflict-avoiding cache

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Just Say No: Benefits of Early Cache Miss Determination

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Proceedings of the 33rd annual international symposium on Computer Architecture
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Performance prediction based on inherent program similarity

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Storage performance virtualization via throughput and latency control

ACM Transactions on Storage (TOS)
Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Power provisioning for a warehouse-sized computer

Proceedings of the 34th annual international symposium on Computer architecture
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Power-aware dynamic placement of HPC applications

Proceedings of the 22nd annual international conference on Supercomputing
Enforcing performance isolation across virtual machines in Xen

Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Towards practical page coloring-based multicore cache management

Proceedings of the 4th ACM European conference on Computer systems
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
VM3: Measuring, modeling and managing VM shared resources

Computer Networks: The International Journal of Computer and Telecommunications Networking
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Q-clouds: managing performance interference effects for QoS-aware clouds

Proceedings of the 5th European conference on Computer systems
Online cache modeling for commodity multicore processors

ACM SIGOPS Operating Systems Review
Runtime measurements in the cloud: observing, analyzing, and reducing variance

Proceedings of the VLDB Endowment
EC2 performance analysis for resource provisioning of service-oriented applications

ICSOC/ServiceWave'09 Proceedings of the 2009 international conference on Service-oriented computing
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture

Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
D-factor: a quantitative model of application slow-down in multi-resource shared systems

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Achieving application-centric performance targets via consolidation on multicores: myth or reality?

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Modeling and performance analysis of large scale IaaS Clouds

Future Generation Computer Systems
Performance implications of multi-tier application deployments on Infrastructure-as-a-Service clouds: Towards performance modeling

Future Generation Computer Systems
CPI2: CPU performance isolation for shared compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
An experimental study of cascading performance interference in a virtualized environment

ACM SIGMETRICS Performance Evaluation Review
Resource efficient computing for warehouse-scale datacenters

Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and modeling of PIDX parallel I/O for performance optimization

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Towards a performance-as-a-service cloud

Proceedings of the 4th annual Symposium on Cloud Computing
QoS-Aware scheduling in heterogeneous datacenters with paragon

ACM Transactions on Computer Systems (TOCS)
Group-based memory oversubscription for virtualized clouds

Journal of Parallel and Distributed Computing
Virtual machine consolidation based on interference modeling

The Journal of Supercomputing
Virtual Machine Coscheduling: A Game Theoretic Approach

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
What to expect when you are consolidating: effective prediction models of application performance on multicores

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Workload consolidation is very attractive for cloud platforms due to several reasons including reduced infrastructure costs, lower energy consumption, and ease of management. Advances in virtualization hardware and software continue to improve resource isolation among consolidated workloads but a particular form of resource interference is yet to see a commercially widely adopted solution - the interference due to shared processor caches. Existing solutions for handling cache interference require new hardware features, extensive software changes, or reduce the achieved overall throughput. A crucial requirement for effective consolidation is to be able to predict the impact of cache interference among consolidated workloads. In this paper, we present a practical technique for predicting performance interference due to shared processor cache which works on current processor architectures and requires minimal software changes. While performance degradation can be empirically measured for a given placement of consolidated workloads, the number of possible placements grows exponentially with the number of workloads and actual measurement of degradation is thus not practical for every possible placement. Our technique predicts the degradation for any possible placement using only a linear number of measurements, and can be used to select the most efficient consolidation pattern, for required performance and resource constraints. An average prediction error of less than 4% is achieved across a wide variety of benchmark workloads, using Xen VMM on Intel Core 2 Duo and Nehalem quad-core processor platforms. We also illustrate the usefulness of our prediction technique in realizing better workload placement decisions for given performance and resource cost objectives.