SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

Authors:
Shekhar Srikantaiah;Mahmut Kandemir
Affiliations:
Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA
Venue:
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Year:
2010

Citing 23
Cited 4

Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
IPC Considered Harmful for Multiprocessor Workloads

IEEE Micro
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
Effective Management of DRAM Bandwidth in Multicore Processors

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A Framework for Providing Quality of Service in Chip Multi-Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

METE: meeting end-to-end QoS in multicores through system-wide resource management

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
METE: meeting end-to-end QoS in multicores through system-wide resource management

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review

Quantified Score

Hi-index	0.00

Visualization

Abstract

There have been many recent works in the context of Chip Multiprocessors (CMPs) investigating the need of intelligent shared cache partitioning which is believed to reduce the pressure on the off-chip bandwidth. Management of the off-chip memory bandwidth to improve system performance and/or mitigate performance volatility of applications has itself received considerable attention. Coordinated resource management schemes treat the interactions between cache allocation and bandwidth management as a black-box. This hinders the ability of these schemes from exploiting the intricate inter-relationships between the resource management strategies. In a multiprogrammed scenario, given the limited availability of the on-chip cache, it is not feasible to entirely eliminate off-chip accesses. However, it is possible to mitigate the impact of additional queueing delays associated with the memory controller by avoiding multiple applications from exercising the off-chip bandwidth simultaneously. Therefore, from the point of view of improving system performance, it is more important to have a symbiotic resource partitioning scheme that performs partitioning of each resource based on feedback it receives from the partitioning of the other. Symbiotic resource partitioning (SRP) proposed in this paper avoids the scenarios of multiple applications exercising the off-chip memory bandwidth simultaneously by appropriately controlling the cache partitioning. In order to control the cache partitioning, SRP employs an empirical model that relies on a metric (last level cache misses per cycle) that represents the off-chip memory bandwidth demand of the applications and models the impact of cache partitioning on bandwidth demand by representing the last level cache misses per cycle metric as a function of the cache allocation per application. This model is dynamically updated to account for the phase behavior of the applications. Moreover, SRP is an iterative approach wherein each iteration of the approach consists of an update to the model, cache partitioning and bandwidth partitioning with a feedback from bandwidth partitioning that updates the model. Extensive simulations with a full system simulator and applications from the MiBench benchmark suite shows that SRP leads to a significant overall improvement in system performance as compared to a state-of the-art cache and bandwidth management schemes.