A Case for Direct-Mapped Caches
Computer
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
Using SimPoint for accurate and efficient simulation
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cache miss behavior: is it √2?
Proceedings of the 3rd conference on Computing frontiers
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Using dead blocks as a virtual victim cache
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
SHiP: signature-based hit predictor for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management
IEEE Computer Architecture Letters
A dual grain hit-miss detector for large die-stacked DRAM caches
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the 40th Annual International Symposium on Computer Architecture
Resilient die-stacked DRAM caches
Proceedings of the 40th Annual International Symposium on Computer Architecture
Software-controlled transparent management of heterogeneous memory resources in virtualized systems
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Large-reach memory management unit caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ARI: Adaptive LLC-memory traffic management
ACM Transactions on Architecture and Code Optimization (TACO)
Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
This paper analyzes the design trade-offs in architecting large-scale DRAM caches. Prior research, including the recent work from Loh and Hill, have organized DRAM caches similar to conventional caches. In this paper, we contend that some of the basic design decisions typically made for conventional caches (such as serialization of tag and data access, large associativity, and update of replacement state) are detrimental to the performance of DRAM caches, as they exacerbate the already high hit latency. We show that higher performance can be obtained by optimizing the DRAM cache architecture first for latency, and then for hit rate. We propose a latency-optimized cache architecture, called Alloy Cache, that eliminates the delay due to tag serialization by streaming tag and data together in a single burst. We also propose a simple and highly effective Memory Access Predictor that incurs a storage overhead of 96 bytes per core and a latency of 1 cycle. It helps service cache misses faster without the need to wait for a cache miss detection in the common case. Our evaluations show that our latency-optimized cache design significantly outperforms both the recent proposal from Loh and Hill, as well as an impractical SRAM Tag-Store design that incurs an unacceptable overhead of several tens of megabytes. On average, the proposal from Loh and Hill provides 8.7% performance improvement, the "idealized" SRAM Tag design provides 24%, and our simple latency-optimized design provides 35%.