The virtual write queue: coordinating DRAM and last-level cache policies

Authors:
Jeffrey Stuecheli;Dimitris Kaseridis;David Daly;Hillery C. Hunter;Lizy K. John
Affiliations:
The University of Texas, & IBM Corp., Austin, TX, USA;The University of Texas, Austin, TX, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;The University of Texas, Austin, TX, USA
Venue:
Proceedings of the 37th annual international symposium on Computer architecture
Year:
2010

Citing 15
Cited 11

Conflict-free access of vectors with power-of-two strides

ICS '92 Proceedings of the 6th international conference on Supercomputing
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Eager writeback - a technique for improving bandwidth utilization

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Tapping ZettaRAM" for Low-Power Memory Systems

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
An effective SDRAM power mode management scheme for performance and energy sensitive embedded systems

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A power and temperature aware DRAM architecture

Proceedings of the 45th annual Design Automation Conference
Power management solutions for computer systems and datacenters

Proceedings of the 13th international symposium on Low power electronics and design
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro
Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Shared Memory Controllers

IEEE Micro
Memory Systems: Cache, DRAM, Disk

Memory Systems: Cache, DRAM, Disk
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
An SDRAM-aware router for Networks-on-Chip

Proceedings of the 46th Annual Design Automation Conference

Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Rank idle time prediction driven last-level cache writeback

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
DRAM power-aware rank scheduling

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
PARDIS: a programmable memory controller for the DDRx interfacing standards

Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving writeback efficiency with decoupled last-write prediction

Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for exploiting subarray-level parallelism (SALP) in DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
PreSET: improving performance of phase change memories by exploiting asymmetry in write times

Proceedings of the 39th Annual International Symposium on Computer Architecture
CMP off-chip bandwidth scheduling guided by instruction criticality

Proceedings of the 27th international ACM conference on International conference on supercomputing
ARI: Adaptive LLC-memory traffic management

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing DRAM row activations with eager read/write clustering

ACM Transactions on Architecture and Code Optimization (TACO)
WADE: Writeback-aware dynamic cache management for NVM-based main memory system

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In computer architecture, caches have primarily been viewed as a means to hide memory latency from the CPU. Cache policies have focused on anticipating the CPU's data needs, and are mostly oblivious to the main memory. In this paper, we demonstrate that the era of many-core architectures has created new main memory bottlenecks, and mandates a new approach: coordination of cache policy with main memory characteristics. Using the cache for memory optimization purposes, we propose a Virtual Write Queue which dramatically expands the memory controller's visibility of processor behavior, at low implementation overhead. Through memory-centric modification of existing policies, such as scheduled writebacks, this paper demonstrates that performance limiting effects of highly-threaded architectures can be overcome. We show that through awareness of the physical main memory layout and by focusing on writes, both read and write average latency can be shortened, memory power reduced, and overall system performance improved. Through full-system cycle-accurate simulations of SPEC cpu2006, we demonstrate that the proposed Virtual Write Queue achieves an average 10.9% system-level throughput improvement on memory-intensive workloads, along with an overall reduction of 8.7% in memory power across the whole suite.