Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses

Authors:
Onur Mutlu;Hyesoon Kim;Yale N. Patt
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
2006

Citing 22
Cited 1

Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Correlation-based hardware prefetching

Correlation-based hardware prefetching
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Power considerations in the design of the Alpha 21264 microprocessor

DAC '98 Proceedings of the 35th annual Design Automation Conference
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
The memory gap and the future of high performance memories

ACM SIGARCH Computer Architecture News
Benchmark health considered harmful

ACM SIGARCH Computer Architecture News
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A stateless, content-directed data prefetching mechanism

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Pointer cache assisted prefetching

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Prefetch injection based on hardware monitoring and object metadata

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors

IEEE Transactions on Computers
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
CAVA: Hiding L2 Misses with Checkpoint-Assisted Value Prediction

IEEE Computer Architecture Letters

Exploiting selective instruction reuse and value prediction in a superscalar architecture

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	14.98

Visualization

Abstract

While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel hardware technique, address-value delta (AVD) prediction. An AVD predictor keeps track of the address (pointer) load instructions for which the arithmetic difference (i.e., delta) between the effective address and the data value is stable. If such a load instruction incurs a long-latency cache miss during runahead execution, its data value is predicted by subtracting the stable delta from its effective address. This prediction enables the preexecution of dependent instructions, including load instructions that incur long-latency cache misses. We analyze why and for what kind of loads AVD prediction works and describe the design of an implementable AVD predictor. We also describe simple hardware and software optimizations that can significantly improve the benefits of AVD prediction and analyze the interaction of AVD prediction with runahead efficiency techniques and stream-based data prefetching. Our analysis shows that AVD prediction is complementary to these techniques. Our results show that augmenting a runahead processor with a simple, 16-entry AVD predictor improves the average execution time of a set of pointer-intensive applications by 14.3 percent (7.5 percent excluding benchmark health).