Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
Effective Management of DRAM Bandwidth in Multicore Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A Framework for Providing Quality of Service in Chip Multi-Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
METE: meeting end-to-end QoS in multicores through system-wide resource management
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
METE: meeting end-to-end QoS in multicores through system-wide resource management
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Hi-index | 0.00 |
There have been many recent works in the context of Chip Multiprocessors (CMPs) investigating the need of intelligent shared cache partitioning which is believed to reduce the pressure on the off-chip bandwidth. Management of the off-chip memory bandwidth to improve system performance and/or mitigate performance volatility of applications has itself received considerable attention. Coordinated resource management schemes treat the interactions between cache allocation and bandwidth management as a black-box. This hinders the ability of these schemes from exploiting the intricate inter-relationships between the resource management strategies. In a multiprogrammed scenario, given the limited availability of the on-chip cache, it is not feasible to entirely eliminate off-chip accesses. However, it is possible to mitigate the impact of additional queueing delays associated with the memory controller by avoiding multiple applications from exercising the off-chip bandwidth simultaneously. Therefore, from the point of view of improving system performance, it is more important to have a symbiotic resource partitioning scheme that performs partitioning of each resource based on feedback it receives from the partitioning of the other. Symbiotic resource partitioning (SRP) proposed in this paper avoids the scenarios of multiple applications exercising the off-chip memory bandwidth simultaneously by appropriately controlling the cache partitioning. In order to control the cache partitioning, SRP employs an empirical model that relies on a metric (last level cache misses per cycle) that represents the off-chip memory bandwidth demand of the applications and models the impact of cache partitioning on bandwidth demand by representing the last level cache misses per cycle metric as a function of the cache allocation per application. This model is dynamically updated to account for the phase behavior of the applications. Moreover, SRP is an iterative approach wherein each iteration of the approach consists of an update to the model, cache partitioning and bandwidth partitioning with a feedback from bandwidth partitioning that updates the model. Extensive simulations with a full system simulator and applications from the MiBench benchmark suite shows that SRP leads to a significant overall improvement in system performance as compared to a state-of the-art cache and bandwidth management schemes.