Reducing Memory Latency via Read-after-Read Memory Dependence Prediction

Authors:
Andreas Moshovos;Gurindar S. Sohi
Affiliations:
Univ. of Toronto, Ontario, Canada;Univ. of Wisconsin-Madison
Venue:
IEEE Transactions on Computers
Year:
2002

Citing 26
Cited 2

Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Minimum cost interprocedural register allocation

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Eliminating operand read latency

ACM SIGARCH Computer Architecture News
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Value locality and speculative execution

Value locality and speculative execution
Predictive techniques for aggressive load speculation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed early load-address generation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Correlated load-address predictors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Selective value prediction

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Performance Simulation of an Alpha Microprocessor

Computer
Memory dependence prediction

Memory dependence prediction
Design and evaluation of a multiscalar processor

Design and evaluation of a multiscalar processor

The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	14.98

Visualization

Abstract

We observe that typical programs exhibit highly regular read-after-read (RAR) memory dependence streams. To exploit this regularity, we introduce read-after-read (RAR) memory dependence prediction. This technique predicts whether: 1) A load will access a memory location that a preceding load accesses and 2) exactly which this preceding load is. This prediction is done without actual knowledge of the corresponding memory addresses. We also present two techniques that utilize RAR memory dependence prediction to reduce memory latency. In the first technique, a load may obtain a value by naming a preceding load with which an RAR dependence is predicted. The second technique speculatively converts a series of ${\rm LOAD}_1{\hbox{-}}{\rm USE}_1,\ldots,{\rm LOAD_N}{\hbox{-}}{\rm USE_N}$ chains into a single ${\rm LOAD}_1{\hbox{-}}{\rm USE}_1\ldots{\rm USE_N}$ producer/consumer graph. This is done whenever RAR dependences are predicted among the ${\rm LOAD_i}$ instructions. Our techniques can be implemented as small extensions to the previously proposed read-after-write (RAW) dependence prediction-based speculative memory cloaking and speculative memory bypassing. On average, our RAR-based techniques provide correct values for an additional 20 percent (integer codes) and 30 percent (floating-point codes) of all loads. Moreover, a combined RAW- and RAR-based cloaking/bypassing mechanism improves performance by 6.44 percent (integer) and 4.66 percent (floating-point) over a highly aggressive dynamically scheduled superscalar processor that uses naive memory dependence speculation. By comparison, the original RAW-based cloaking/bypassing mechanism yields improvements of 4.28 percent (integer) and 3.20 percent (floating-point). When no memory dependence speculation is used, our techniques yield speedups of 9.85 percent (integer) and 6.14 percent (floating-point).