Scaling the issue window with look-ahead latency prediction

Authors:
Yongxiang Liu;Anahita Shayesteh;Gokhan Memik;Glenn Reinman
Affiliations:
University of California, Los Angeles, CA;University of California, Los Angeles, CA;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL
Venue:
Proceedings of the 18th annual international conference on Supercomputing
Year:
2004

Citing 16
Cited 11

Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
The Counterflow Pipeline Processor Architecture

IEEE Design & Test
The Alpha 21264 Microprocessor

IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Recovery Mechanism for Latency Misprediction

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Just Say No: Benefits of Early Cache Miss Determination

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Instruction packing: reducing power and delay of the dynamic scheduling logic

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Tornado warning: the perils of selective replay in multithreaded processors

Proceedings of the 19th annual international conference on Supercomputing
Reducing the Energy of Speculative Instruction Schedulers

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Power-Efficient Wakeup Tag Broadcast

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction packing: Toward fast and energy-efficient instruction scheduling

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
On reducing energy-consumption by late-inserting instructions into the issue queue

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
A low-complexity microprocessor design with speculative pre-execution

Journal of Systems Architecture: the EUROMICRO Journal
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Instruction recirculation: eliminating counting logic in wakeup-free schedulers

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Reducing delay and power consumption of the wakeup logic through instruction packing and tag memoization

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In contemporary out-of-order superscalar design, high IPC is mainly achieved by exposing high instruction level parallelism (ILP). Scaling issue window size can certainly provide more ILP; however, future processor scaling demands threaten to limit the size of the issue window.In this study, we propose a dynamic instruction sorting mechanism that provides more ILP without increasing the size of the issue window. In our approach, early in the pipeline, we predict how long an instruction needs to wait before it can be issued, i.e. the waiting time for its operands to be produced. Using this knowledge, the instructions are placed into a sorting structure, which allows instructions with shorter waiting times enter the issue window ahead of those instructions with longer waiting times, preventing long-waiting instructions from clogging the issue queue.The accuracy in predicting instruction waiting times directly determines the effectiveness of our sorting mechanism. While most instructions have deterministic execution latencies, predicting load execution times is more difficult due to cache misses and in-flight loads. Loads are particularly challenging since their execution time can vary significantly. In this study, we examine techniques to predict load execution time accurately, based on data reference history.