WADE: Writeback-aware dynamic cache management for NVM-based main memory system

Authors:
Zhe Wang;Shuchang Shan;Ting Cao;Junli Gu;Yi Xu;Shuai Mu;Yuan Xie;Daniel A. Jiménez
Affiliations:
Texas A&M University;Chinese Institute of Computing Technology;Australian National University;AMD Research;AMD Research;Tsinghua University;AMD Research/Pennsylvania State University;Texas A&M University
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 31
Cited 0

Eager writeback - a technique for improving bandwidth utilization

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays

IEEE Micro
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Stealth prefetching

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Phase-change random access memory: a scalable technology

IBM Journal of Research and Development
Architecting phase change memory as a scalable dram alternative

Proceedings of the 36th annual international symposium on Computer architecture
A durable and energy efficient main memory using phase change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
The virtual write queue: coordinating DRAM and last-level cache policies

Proceedings of the 37th annual international symposium on Computer architecture
Energy- and endurance-aware design of phase change memory caches

Proceedings of the Conference on Design, Automation and Test in Europe
Modeling, Architecture, and Applications for Emerging Memory Technologies

IEEE Design & Test
CHOP: Integrating DRAM Caches for CMP Server Platforms

IEEE Micro
Page placement in hybrid memory systems

Proceedings of the international conference on Supercomputing
DRAMSim2: A Cycle Accurate Memory System Simulator

IEEE Computer Architecture Letters
MARSS: a full system simulator for multicore x86 CPUs

Proceedings of the 48th Design Automation Conference
Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
SHiP: signature-based hit predictor for high performance caching

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Decoupled dynamic cache segmentation

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Improving writeback efficiency with decoupled last-write prediction

Proceedings of the 39th Annual International Symposium on Computer Architecture
PreSET: improving performance of phase change memories by exploiting asymmetry in write times

Proceedings of the 39th Annual International Symposium on Computer Architecture
Row buffer locality aware caching policies for hybrid memories

ICCD '12 Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emerging Non-Volatile Memory (NVM) technologies are explored as potential alternatives to traditional SRAM/DRAM-based memory architecture in future microprocessor design. One of the major disadvantages for NVM is the latency and energy overhead associated with write operations. Mitigation techniques to minimize the write overhead for NVM-based main memory architecture have been studied extensively. However, most prior work focuses on optimization techniques for NVM-based main memory itself, with little attention paid to cache management policies for the Last-Level Cache (LLC). In this article, we propose a Writeback-Aware Dynamic CachE (WADE) management technique to help mitigate the write overhead in NVM-based memory.1 The proposal is based on the observation that, when dirty cache blocks are evicted from the LLC and written into NVM-based memory (with PCM as an example), the long latency and high energy associated with write operations to NVM-based memory can cause system performance/power degradation. Thus, reducing the number of writeback requests from the LLC is critical. The proposed WADE cache management technique tries to keep highly reused dirty cache blocks in the LLC. The technique predicts blocks that are frequently written back in the LLC. The LLC sets are dynamically partitioned into a frequent writeback list and a nonfrequent writeback list. It keeps a best size of each list in the LLC. Our evaluation shows that the technique can reduce the number of writeback requests by 16.5% for memory-intensive single-threaded benchmarks and 10.8% for multicore workloads. It yields a geometric mean speedup of 5.1% for single-thread applications and 7.6% for multicore workloads. Due to the reduced number of writeback requests to main memory, the technique reduces the energy consumption by 8.1% for single-thread applications and 7.6% for multicore workloads.