Read-after-read memory dependence prediction

Authors:
Andreas Moshovos;Gurindar S. Sohi
Affiliations:
Electrical and Computer Engineering Department, Northwestern University;Computer Sciences Department, University of Wisconsin-Madison
Venue:
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Year:
1999

Citing 17
Cited 9

Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Value locality and speculative execution

Value locality and speculative execution
Compiler-directed early load-address generation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Correlated load-address predictors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Speculative Memory Cloaking and Bypassing

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Memory dependence prediction

Memory dependence prediction

Load and store reuse using register file contents

ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamic memory instruction bypassing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Load Redundancy Removal through Instruction Reuse

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Address-free memory access based on program syntax correlation of loads and stores

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
An experimental evaluation of scalar replacement on scientific benchmarks

Software—Practice & Experience
Dynamic memory instruction bypassing

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Performance Enhancement by Eliminating Redundant Function Execution

ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We identify that typical programs exhibit highly regular read-after-read (RAR) memory dependence streams. We exploit this regularity by introducing read-after-read (RAR) memory dependence prediction. We also present two RAR memory dependence prediction-based memory latency reduction techniques. In the first technique, a load can obtain a value by simply naming a preceding load with which a RAR dependence is predicted. The second technique speculatively converts a series of LOADI-USEI,…,LOADN-USEN chains into a single LOADI-USEI…USEN producer/consumer graph. Our techniques can be implemented as surgical extensions to the recently proposed read-after-write (RAW) dependence prediction based speculative memory cloaking and speculative memory bypassing. On average, our techniques provide correct values for an additional 20% (integer codes) and 30% (floating-point codes) of all loads. Moreover, a combined RAW- and RAR-based cloaking/bypassing mechanism improves performance by 6.44% (integer) and 4.66% (floating-point) even when naive memory dependence speculation is used. The original RAW-based cloaking/bypassing mechanism yields improvements of 4.28% (integer) and 3.20% (floating-point).