A dual grain hit-miss detector for large die-stacked DRAM caches

Authors:
Michel El-Nacouzi;Islam Atta;Myrto Papadopoulou;Jason Zebchuk;Natalie Enright Jerger;Andreas Moshovos
Affiliations:
University of Toronto;University of Toronto;University of Toronto;University of Toronto;University of Toronto;University of Toronto
Venue:
Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2013

Citing 11
Cited 0

Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Just Say No: Benefits of Early Cache Miss Determination

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
CHOP: Integrating DRAM Caches for CMP Server Platforms

IEEE Micro
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Die-Stacked DRAM caches offer the promise of improved performance and reduced energy by capturing a larger fraction of an application's working set than on-die SRAM caches. However, given that their latency is only 50% lower than that of main memory, DRAM caches considerably increase latency for misses. They also incur a significant energy overhead for remote lookups in snoop-based multi-socket systems. Ideally, it would be possible to detect in advance that a request will miss in the DRAM cache and thus selectively bypass it. This work proposes a "dual grain filter" which successfully predicts whether an access is a hit or a miss in most cases. Experimental results with commercial and scientific workloads show that a 158KB dual-grain filter can correctly predict data block residency for 85% of all accesses to a 256MB DRAM cache. As a result, average off-die latency with our filter is within 8% of that possible with a perfectly accurate filter, which is impractical to implement.