A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

Authors:
Jaewoong Sim;Gabriel H. Loh;Hyesoon Kim;Mike O'Connor;Mithuna Thottethodi
Affiliations:
-;-;-;-;-
Venue:
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2012

Citing 13
Cited 2

Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Die Stacking (3D) Microarchitecture

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor

IEEE Micro
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient GPU design with reconfigurable in-package graphics memory

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

A dual grain hit-miss detector for large die-stacked DRAM caches

Proceedings of the Conference on Design, Automation and Test in Europe
Resilient die-stacked DRAM caches

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Die-stacking technology allows conventional DRAM to be integrated with processors. While numerous opportunities to make use of such stacked DRAM exist, one promising way is to use it as a large cache. Although previous studies show that DRAM caches can deliver performance benefits, there remain inefficiencies as well as significant hardware costs for auxiliary structures. This paper presents two innovations that exploit the bursty nature of memory requests to streamline the DRAM cache. The first is a low-cost Hit-Miss Predictor (HMP) that virtually eliminates the hardware overhead of the previously proposed multi-megabyte Miss Map structure. The second is a Self-Balancing Dispatch (SBD) mechanism that dynamically sends some requests to the off-chip memory even though the request may have hit in the die-stacked DRAM cache. This makes effective use of otherwise idle off-chip bandwidth when the DRAM cache is servicing a burst of cache hits. These techniques, however, are hampered by dirty (modified) data in the DRAM cache. To ensure correctness in the presence of dirty data in the cache, the HMP must verify that a block predicted as a miss is not actually present, otherwise the dirty block must be provided. This verification process can add latency, especially when DRAM cache banks are busy. In a similar vein, SBD cannot redirect requests to off-chip memory when a dirty copy of the block exists in the DRAM cache. To relax these constraints, we introduce a hybrid write policy for the cache that simultaneously supports write-through and write-back policies for different pages. Only a limited number of pages are permitted to operate in a write-back mode at one time, thereby bounding the amount of dirty data in the DRAM cache. By keeping the majority of the DRAM cache clean, most HMP predictions do not need to be verified, and the self balancing dispatch has more opportunities to redistribute requests (i.e., only requests to the limited number of dirty pages must go to the DRAM cache to maintain correctness). Our proposed techniques improve performance compared to the Miss Map-based DRAM cache approach while simultaneously eliminating the costly Miss Map structure.