Prefetch-Aware DRAM Controllers

Authors:
Chang Joo Lee;Onur Mutlu;Veynu Narasiman;Yale N. Patt
Affiliations:
Department of Electrical and Computer Engineering, The University of Texas at Austin, USA;Microsoft Research and Carnegie Mellon University, USA;Department of Electrical and Computer Engineering, The University of Texas at Austin, USA;Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
Venue:
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Year:
2008

Citing 27
Cited 28

An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Profetching and memory system behavior of the SPEC95 benchmark suite

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Hardware-only stream prefetching and dynamic access ordering

Proceedings of the 14th international conference on Supercomputing
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Dynamic Access Ordering for Streamed Computations

IEEE Transactions on Computers
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
AC/DC: An Adaptive Data Cache Prefetcher

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Adaptive History-Based Memory Schedulers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A study of performance impact of memory controller features in multi-processor server environment

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Using the first-level caches as filters to reduce the pollution caused by speculative memory references

International Journal of Parallel Programming
Reducing Cache Pollution via Dynamic Data Prefetch Filtering

IEEE Transactions on Computers
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Memory Prefetching Using Adaptive Stream Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
IBM POWER6 microarchitecture

IBM Journal of Research and Development
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
POWER4 system microarchitecture

IBM Journal of Research and Development

Buffer management for colored packets with deadlines

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Coordinated control of multiple prefetchers in multi-core systems

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Improving memory bank-level parallelism in the presence of prefetching

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rethinking DRAM design and organization for energy-constrained multi-cores

Proceedings of the 37th annual international symposium on Computer architecture
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Software-hardware cooperative DRAM bank partitioning for chip multiprocessors

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Exploring the prefetcher/memory controller design space: an opportunistic prefetch scheduling strategy

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
METE: meeting end-to-end QoS in multicores through system-wide resource management

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
METE: meeting end-to-end QoS in multicores through system-wide resource management

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Bandwidth constrained coordinated HW/SW prefetching for multicores

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities

Proceedings of the 26th ACM international conference on Supercomputing
Unified memory optimizing architecture: memory subsystem control with a unified predictor

Proceedings of the 26th ACM international conference on Supercomputing
Making data prefetch smarter: adaptive prefetching on POWER7

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

Proceedings of the 27th international ACM conference on International conference on supercomputing
Orchestrated scheduling and prefetching for GPGPUs

Proceedings of the 40th Annual International Symposium on Computer Architecture
A network congestion-aware memory subsystem for manycore

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Meeting midway: improving CMP performance with memory-side prefetching

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing DRAM controllers employ rigid, non-adaptive scheduling and buffer management policies when servicing prefetch requests. Some controllers treat prefetch requests the same as demand requests, others always prioritize demand requests over prefetch requests. However, none of these rigid policies result in the best performance because they do not take into account the usefulness of prefetch requests. If prefetch requests are useless, treating prefetches and demands equally can lead to significant performance loss and extra bandwidth consumption. In contrast, if prefetch requests are useful, prioritizing demands over prefetches can hurt performance by reducing DRAM throughput and delaying the service of useful requests. This paper proposes a new low-cost memory controller, called Prefetch-Aware DRAM Controller (PADC), that aims to maximize the benefit of useful prefetches and minimize the harm caused by useless prefetches. To accomplish this, PADC estimates the usefulness of prefetch requests and dynamically adapts its scheduling and buffer management policies based on the estimates. The key idea is to 1) adaptively prioritize between demand and prefetch requests, and 2) drop useless prefetches to free up memory system resources, based on the accuracy of the prefetcher. Our evaluation shows that PADC significantly outperforms previous memory controllers with rigid prefetch handling policies on both single- and multi-core systems with a variety of prefetching algorithms. Across a wide range of multiprogrammed SPEC CPU 2000/2006 workloads, it improves system performance by 8.2%on a 4-core system and by 9.9%on an 8-core system while reducing DRAM bandwidth consumption by 10.7% and 9.4% respectively.