Improving writeback efficiency with decoupled last-write prediction

Authors:
Zhe Wang;Samira M. Khan;Daniel A. Jiménez
Affiliations:
The University of Texas at San Antonio;The University of Texas at San Antonio;The University of Texas at San Antonio
Venue:
Proceedings of the 39th Annual International Symposium on Computer Architecture
Year:
2012

Citing 25
Cited 4

Trading conflict and capacity aliasing in conditional branch predictors

Proceedings of the 24th annual international symposium on Computer architecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Eager writeback - a technique for improving bandwidth utilization

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
High-Performance DRAMs in Workstation Environments

IEEE Transactions on Computers
Timekeeping in the memory system: predicting and optimizing memory behavior

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A Burst Scheduling Access Reordering Mechanism

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Memory Systems: Cache, DRAM, Disk

Memory Systems: Cache, DRAM, Disk
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
The virtual write queue: coordinating DRAM and last-level cache policies

Proceedings of the 37th annual international symposium on Computer architecture
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Sampling Dead Block Prediction for Last-Level Caches

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
DRAMSim2: A Cycle Accurate Memory System Simulator

IEEE Computer Architecture Letters
MARSS: a full system simulator for multicore x86 CPUs

Proceedings of the 48th Design Automation Conference

Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache

Proceedings of the 40th Annual International Symposium on Computer Architecture
ARI: Adaptive LLC-memory traffic management

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing DRAM row activations with eager read/write clustering

ACM Transactions on Architecture and Code Optimization (TACO)
WADE: Writeback-aware dynamic cache management for NVM-based main memory system

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In modern DDRx memory systems, memory write requests compete with read requests for available memory resources, significantly increasing the average read request service time. Caches are used to mitigate long memory read latency that limits system performance. Dirty blocks in the last-level cache (LLC) that will not be written again before they are evicted will eventually be written back to memory. We refer to these blocks as last-write blocks. In this paper, we propose an LLC writeback technique that improves DRAM efficiency by scheduling predicted last-write blocks early. We propose a low overhead last-write predictor for the LLC. The predicted last-write blocks are made available to the memory controller for scheduling. This technique effectively re-distributes the memory requests and expands writes scheduling opportunities, allowing writes to be serviced efficiently by DRAM. The technique is flexible enough to be applied to any LLC replacement policy. Our evaluation with multi-programmed workloads shows that the technique significantly improves performance by 6.5%-11.4% on average over the traditional writeback technique in an eight-core processor with various DRAM configurations running memory intensive benchmarks.