Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Predictive techniques for aggressive load speculation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Automatic Generation of Microarchitecture Simulators
ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Memory dependence prediction
Load and store reuse using register file contents
ICS '01 Proceedings of the 15th international conference on Supercomputing
Cost Effective Memory Dependence Prediction using Speculation Levels and Color Sets
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Instruction Wake-Up in Wide Issue Superscalars
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Memory Ordering: A Value-Based Approach
Proceedings of the 31st annual international symposium on Computer architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization
Proceedings of the 32nd annual international symposium on Computer Architecture
Fast branch misprediction recovery in out-of-order superscalar processors
Proceedings of the 19th annual international conference on Supercomputing
Scalable Store-Load Forwarding via Store Queue Index Prediction
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Improving single-thread performance with fine-grain state maintenance
Proceedings of the 5th conference on Computing frontiers
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
SAMIE-LSQ: set-associative multiple-instruction entry load/store queue
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
With the help of the memory dependence predictor the instruction scheduler can speculatively issue load instructions at the earliest possible time without causing significant amounts of memory order violations. For maximum performance, the scheduler must also allow full out-of-order issuing of store instructions since any superfluous ordering of stores results in false memory dependencies which adversely affect the timely issuing of dependent loads. Unfortunately, simple techniques of detecting memory order violations do not work well when store instructions issue out-of-order since they yield many false memory order violations. By using a novel memory order violation detection mechanism that is employed in the retire logic of the processor and delaying the checking for memory order violations, we are able to allow full out-of-order issuing of store instructions without causing false memory order violations. In addition, our mechanism can take advantage of data value redundancy. We present an implementation of our technique using the store set memory dependence predictor. An out-of-order superscalar processor that uses our technique delivers an IPC which is within 100, 96 and 85 % of a processor equipped with an ideal memory disambiguator at issue widths of 8, 16 and 32 instructions respectively.