Directly characterizing cross core interference through contention synthesis

Authors:
Jason Mars;Lingjia Tang;Mary Lou Soffa
Affiliations:
University of Virginia;University of Virginia;University of Virginia
Venue:
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Year:
2011

Citing 26
Cited 10

A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Exposing Memory Access Regularities Using Object-Relative Memory Profiling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Memory Profiling using Hardware Counters

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
What can performance counters do for memory subsystem analysis?

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
PAM: a novel performance/power aware meta-scheduler for multi-core systems

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Using OS Observations to Improve Performance in Multicore Systems

IEEE Micro
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Managing contention for shared resources on multicore processors

Communications of the ACM
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resource-conscious scheduling for energy efficiency on multicore processors

Proceedings of the 5th European conference on Computer systems
Locating cache performance bottlenecks using data profiling

Proceedings of the 5th European conference on Computer systems
Contention aware execution: online contention detection and response

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Synthesizing contention

Proceedings of the Workshop on Binary Instrumentation and Applications

Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

Proceedings of the international symposium on Memory management
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient techniques for predicting cache sharing and throughput

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Measuring interference between live datacenter applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An empirical model for predicting cross-core performance interference on multicore processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-aware thread co-location in heterogeneous multicore processors

Proceedings of the Eleventh ACM International Conference on Embedded Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a direct methodology and framework for the measurement and characterization of an application's cross-core interference sensitivity on multicore microarchitectures. While prior works use indirect indicators, such as last level cache miss rate, to infer an application's cross-core interference sensitivity, our approach is direct, in that it characterizes the application's cross-core interference sensitivity using the performance impact due to actual contention. Our methodology and framework, the Cross-core interference Profiling Environment, or CiPE, is composed of a lightweight runtime environment on which a host application runs, along with a carefully designed contention synthesis engine that executes on a neighboring core. CiPE manipulates the co-running contention synthesis engine, while monitoring and analyzing the resulting dynamic impact on the host application. CiPE is able to characterize the cross-core interference sensitivity of the entire application, its individual phases, or source level code regions. To demonstrate the effectiveness of CiPE, we use CiPE characterizations to address two pressing problems. First, we use CiPE characterizations to perform contention conscious batch scheduling that minimizes cross-core interference, resulting in a 12% performance improvment on average when applied to the SPEC2006 benchmark suite, and beyond 20% in the case of mcf and omnetpp. Second, we use CiPE to design a performance analysis tool that is capable identifying contentious bottlenecks in application code.