Counting Dependence Predictors

Authors:
Franziska Roesner;Doug Burger;Stephen W. Keckler
Affiliations:
-;-;-
Venue:
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Year:
2008

Citing 22
Cited 1

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Speculative Memory Cloaking and Bypassing

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Baring It All to Software: Raw Machines

Computer
The Stanford Hydra CMP

IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Cost Effective Memory Dependence Prediction using Speculation Levels and Color Sets

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scaling to the End of Silicon with EDGE Architectures

Computer
Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Scalable Store-Load Forwarding via Store Queue Index Prediction

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NoSQ: Store-Load Communication without a Store Queue

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

Distributed replay protocol for distributed uniprocessors

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern processors rely on memory dependence prediction to execute load instructions as early as possible, speculating that they are not dependent on an earlier, unissued store. To date, the most sophisticated dependence predictors, such as Store Sets, have been tightly coupled to the fetch and execution streams, requiring global knowledge of the in-flight stream of stores to synchronize loads with specific stores. This paper proposes a new dependence predictor design, called a Counting Dependence Predictor (CDP). The key feature of CDPs is that the prediction mechanism predicts some set of events for which a particular dynamic load should wait, which may include some number of matching stores. By waiting for local events only, this dependence predictor can work effectively in a distributed microarchitecture where centralized fetch and execution streams are infeasible or undesirable. We describe and evaluate a distributed Counting Dependence Predictor and protocol that achieves 92% of the performance of perfect memory disambiguation. It outperforms a load-wait table, similar to the Alpha 21264, by 11%. Idealized, centralized implementations of Store Sets and the Exclusive Collision Predictor, both of which would be difficult to implement in a distributed microarchitecture, achieve 97% and 94% of oracular performance, respectively.