Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 14th international conference on Supercomputing
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Counterflow Pipeline Processor Architecture
IEEE Design & Test
The Alpha 21264 Microprocessor
IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Recovery Mechanism for Latency Misprediction
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Just Say No: Benefits of Early Cache Miss Determination
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Instruction packing: reducing power and delay of the dynamic scheduling logic
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
Reducing the Energy of Speculative Instruction Schedulers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Power-Efficient Wakeup Tag Broadcast
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction packing: Toward fast and energy-efficient instruction scheduling
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
On reducing energy-consumption by late-inserting instructions into the issue queue
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
A low-complexity microprocessor design with speculative pre-execution
Journal of Systems Architecture: the EUROMICRO Journal
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Instruction recirculation: eliminating counting logic in wakeup-free schedulers
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Hi-index | 0.00 |
In contemporary out-of-order superscalar design, high IPC is mainly achieved by exposing high instruction level parallelism (ILP). Scaling issue window size can certainly provide more ILP; however, future processor scaling demands threaten to limit the size of the issue window.In this study, we propose a dynamic instruction sorting mechanism that provides more ILP without increasing the size of the issue window. In our approach, early in the pipeline, we predict how long an instruction needs to wait before it can be issued, i.e. the waiting time for its operands to be produced. Using this knowledge, the instructions are placed into a sorting structure, which allows instructions with shorter waiting times enter the issue window ahead of those instructions with longer waiting times, preventing long-waiting instructions from clogging the issue queue.The accuracy in predicting instruction waiting times directly determines the effectiveness of our sorting mechanism. While most instructions have deterministic execution latencies, predicting load execution times is more difficult due to cache misses and in-flight loads. Loads are particularly challenging since their execution time can vary significantly. In this study, we examine techniques to predict load execution time accurately, based on data reference history.