Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores

Authors:
Fazal Hameed;Lars Bauer;Jörg Henkel
Affiliations:
Karlsruhe Institute of Technology (KIT), Germany;Karlsruhe Institute of Technology (KIT), Germany;Karlsruhe Institute of Technology (KIT), Germany
Venue:
Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2013

Citing 13
Cited 2

Interconnect characteristics of 2.5-D system integration scheme

Proceedings of the 2001 international symposium on Physical design
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Die Stacking (3D) Microarchitecture

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Hybrid cache architecture with disparate memory technologies

Proceedings of the 36th annual international symposium on Computer architecture
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamically reconfigurable hybrid cache: an energy-efficient last-level cache design

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Reducing inter-core cache contention with an adaptive bank mapping policy in DRAM cache

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

On-chip DRAM caches may alleviate the memory bandwidth problem in future multi-core architectures through reducing off-chip accesses via increased cache capacity. For memory intensive applications, recent research has demonstrated the benefits of introducing high capacity on-chip L4-DRAM as Last-Level-Cache between L3-SRAM and off-chip memory. These multi-core cache hierarchies attempt to exploit the latency benefits of L3-SRAM and capacity benefits of L4-DRAM caches. However, not taking into consideration the cache access patterns of complex applications can cause inter-core DRAM interference and inter-core cache contention. In this paper, we contest to re-architect existing cache hierarchies by proposing a hybrid cache architecture, where the Last-Level-Cache is a combination of SRAM and DRAM caches. We propose an adaptive DRAM placement policy in response to the diverse requirements of complex applications with different cache access behaviors. It reduces inter-core DRAM interference and inter-core cache contention in SRAM/DRAM-based hybrid cache architectures: increasing the harmonic mean instruction-per-cycle throughput by 23.3% (max. 56%) and 13.3% (max. 35.1%) compared to state-of-the-art.