Design and optimization of the store vectors memory dependence predictor

Authors:
Samantika Subramaniam;Gabriel H. Loh
Affiliations:
Georgia Institute of Technology, Atlanta, GA;Georgia Institute of Technology, Atlanta, GA
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2009

Citing 31
Cited 0

Efficient detection of all pointer and array access errors

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
One Billion Transistors, One Uniprocessor, One Chip

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
The Alpha 21264 Microprocessor

IEEE Micro
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Complexity-effective superscalar processors

Complexity-effective superscalar processors
Memory dependence prediction

Memory dependence prediction
Picking Statistically Valid and Early Simulation Points

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Scalable Store-Load Forwarding via Store Queue Index Prediction

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
NoSQ: Store-Load Communication without a Store Queue

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Matrix scheduler reloaded

Proceedings of the 34th annual international symposium on Computer architecture
BioBench: A Benchmark Suite of Bioinformatics Applications

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Amdahl's Law in the Multicore Era

Computer

Quantified Score

Hi-index	0.00

Visualization

Abstract

Allowing loads that do not violate memory ordering to issue out of order with respect to earlier unresolved store addresses is very important for extracting parallelism in large-window superscalar processors. Previous research has proposed memory dependence prediction algorithms to prevent only loads with true memory dependencies from issuing in the presence of unresolved stores. Techniques such as load-store pair identification and store sets have been very successful in achieving performance levels close to that attained by an oracle-dependence predictor, but have relatively complex or power-hungry designs. In this article, we use the idea of dependency vectors from matrix schedulers for nonmemory instructions and adapt them to implement a new dependence prediction algorithm. We show that for conservatively sized processors, a simple PC-indexed table that tracks misordered loads is sufficient to provide most of the performance benefits achieved by more sophisticated predictors. On more aggressive processor configurations, however, our “Store Vector” algorithm provides better performance than the state-of-the-art store sets predictor while maintaining a simpler and more scalable design.