Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Enterprise IT Trends and Implications for Architecture Research
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A Framework for Providing Quality of Service in Chip Multi-Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
PAM: a novel performance/power aware meta-scheduler for multi-core systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
PowerNap: eliminating server idle power
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Rate-based QoS techniques for cache/memory in CMP platforms
Proceedings of the 23rd international conference on Supercomputing
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Q-clouds: managing performance interference effects for QoS-aware clouds
Proceedings of the 5th European conference on Computer systems
Contention aware execution: online contention detection and response
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Cache topology aware computation mapping for multicores
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
An approach to resource-aware co-scheduling for CMPs
Proceedings of the 24th ACM International Conference on Supercomputing
On mitigating memory bandwidth contention through bandwidth-aware scheduling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hardware execution throttling for multi-core resource management
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Dynamic cache contention detection in multi-threaded applications
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
The impact of memory subsystem resource sharing on datacenter applications
Proceedings of the 38th annual international symposium on Computer architecture
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Automated locality optimization based on the reuse distance of string operations
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
MAO -- An extensible micro-architectural optimizer
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Whare-map: heterogeneity in "homogeneous" warehouse-scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Librando: transparent code randomization for just-in-time compilers
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
An empirical model for predicting cross-core performance interference on multicore processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
As the class of datacenters recently coined as warehouse scale computers (WSCs) continues to leverage commodity multicore processors with increasing core counts, there is a growing need to consolidate various workloads on these machines to fully utilize their computation power. However, it is well known that when multiple applications are co-located on a multicore machine, contention for shared memory resources can cause severe cross-core performance interference. To ensure that the quality of service (QoS) of user-facing applications does not suffer from performance interference, WSC operators resort to disallowing co-location of latency-sensitive applications with other applications. This policy translates to low machine utilization and millions of dollars wasted in WSCs. This paper presents QoS-Compile, the first compilation approach that statically manipulates application contentiousness to enable the co-location of applications with varying QoS requirements, and as a result, can greatly improve machine utilization. Our technique first pinpoints an application's code regions that tend to cause contention and performance interference. QoS-Compile then transforms those regions to reduce their contentious nature. In essence, to co-locate applications of different QoS priorities, our compilation technique uses pessimizing transformations to throttle down the memory access rate of the contentious regions in low priority applications to reduce their interference to high priority applications. Our evaluation using synthetic benchmarks, SPEC benchmarks and large-scale Google applications show that QoS-Compile can greatly reduce contention, improve QoS of applications, and improve machine utilization. Our experiments show that our technique improves applications' QoS performance by 21% and machine utilization by 36% on average.