Speculative Memory Cloaking and Bypassing

Authors:
Andreas Moshovos;Gurindar S. Sohi
Affiliations:
-;-
Venue:
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Year:
1999

Citing 21
Cited 5

Dataflow machine architecture

ACM Computing Surveys (CSUR)
Analysis of memory referencing behavior for design of local memories

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A limit study of local memory requirements using value reuse profiles

Proceedings of the 28th annual international symposium on Microarchitecture
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Value locality and speculative execution

Value locality and speculative execution
Predictive techniques for aggressive load speculation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed early load-address generation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance Simulation of an Alpha Microprocessor

Computer
Improving CC-NUMA Performance Using Instruction-Based Prediction

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Memory dependence prediction

Memory dependence prediction
Design and evaluation of a multiscalar processor

Design and evaluation of a multiscalar processor

Read-after-read memory dependence prediction

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Incremental Commit Groups for Non-Atomic Trace Processing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Counting Dependence Predictors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An evaluation of the TRIPS computer system

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

We revisit memory hierarchy design viewing memory as an inter-operation communication mechanism. We show how dynamically collected information about inter-operation memory communication can be used to improve memory latency. We propose two techniques: (1) Speculative Memory Cloaking, and (2) Speculative Memory Bypassing. In the first technique, we use memory dependence prediction to speculatively identify dependent loads and stores early in the pipeline. These instructions may then communicate prior to address calculation and disambiguation via a fast communication mechanism. In the second technique, we use memory dependence prediction to speculatively transform DEF-store-load-USE dependence chains within the instruction window into DEF-USE ones. As a result, dependent stores and loads are taken off the communication path resulting in further reduction in communication latency. Experimental analysis shows that our methods, on the average, correctly handle 40% (integer) and 19% (floating point) of all memory loads. Moreover, our techniques result in performance improvements of 4.28% (integer) and 3.20% (floating point) over a highly aggressive, dynamically scheduled processor implementing naive memory dependence speculation. We also study the value and address locality characteristics of the values our methods correctly handle. We demonstrate that our methods are orthogonal to both address and value prediction.