Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms

Authors:
Chenjie Yu;Peter Petrov
Affiliations:
University of Maryland, College Park;University of Maryland, College Park
Venue:
Proceedings of the 47th Design Automation Conference
Year:
2010

Citing 14
Cited 4

Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
A self-tuning configurable cache

Proceedings of the 44th annual Design Automation Conference
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture

Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems

Proceedings of the 48th Design Automation Conference
Why nothing matters: the impact of zeroing

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Bandwidth-aware reconfigurable cache design with hybrid memory technologies

Proceedings of the International Conference on Computer-Aided Design
Dynamic cache management in multi-core architectures through run-time adaptation

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a methodology for off-chip memory bandwidth minimization through application-driven L2 cache partitioning in multi-core systems. A major challenge with multi-core system design is the widening gap between the memory demand generated by the processor cores and the limited off-chip memory bandwidth and memory service speed. This severely restricts the number of cores that can be integrated into a multi-core system and the parallelism that can be actually achieved and efficiently exploited for not only memory demanding applications, but also for workloads consisting of many tasks utilizing a large number of cores and thus exceeding the available off-chip bandwidth. Last level shared cache partitioning has been shown to be a promising technique to enhance cache utilization and reduce missrates. While most cache partitioning techniques focus on cache miss rates, our work takes a different approach in which tasks' memory bandwidth requirements are taken into account when identifying a cache partitioning for multi-programmed and/or multi-threaded workloads. Cache resources are allocated with the objective that the overall system bandwidth requirement is minimized for the target workload. The key insight is that cache miss-rate information may severely misrepresent the actual bandwidth demand of the task, which ultimately determines the overall system performance and power consumption.