Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors

Authors:
Woojin Choi;Seok-Jun Park;Michel Dubois
Affiliations:
Department of Electrical Engineering, University of Southern California,;System LSI Division, Samsung Electronics Corporation,;Department of Electrical Engineering, University of Southern California,
Venue:
Transactions on High-Performance Embedded Architectures and Compilers II
Year:
2009

Citing 20
Cited 0

Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the issue logic

ICS '01 Proceedings of the 15th international conference on Supercomputing
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
2001 Technology Roadmap for Semiconductors

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
The Alpha 21264 Microprocessor

IEEE Micro
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Scaling the issue window with look-ahead latency prediction

Proceedings of the 18th annual international conference on Supercomputing
Understanding Scheduling Replay Schemes

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Exploring Wakeup-Free Instruction Scheduling

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Tornado warning: the perils of selective replay in multithreaded processors

Proceedings of the 19th annual international conference on Supercomputing
POWER4 system microarchitecture

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

The role of the instruction scheduler is to supply instructions to functional units in a timely manner so as to avoid data and structural hazards. Current schedulers are based on the broadcast of result register numbers to all instructions waiting in the issue queue and on a global arbiter to select ready instructions from that queue. This approach called broadcast scheduling does not scale well due to its complexity. To reduce the complexity of the broadcast schedulers, data-flow pre-scheduling has been proposed. The basic idea is to predict the issue time of instructions based on the availability of operands and then time them down until they are ready to issue. However, resource conflicts for issue slots and functional units delay the issue time of conflicted instructions, and cause a large amount of replays. We propose to add instruction pre-selection to data-flow pre-schedulers for accurate instruction pre-scheduling . Our pre-scheduler keeps track of the allocation status of resources so that re source conflicts are eliminated. Pre-scheduled instructions are stored in an issue buffer until their issue delay elapses and then issue automatically. Our analysis shows that pre-schedulers with pre-selection result in performance improvements of 60% over current broadcast schedulers in pipeline designs where the scheduler is the bottleneck. In future technologies we expect this result to hold as logic intensive designs with short wires will be preferable to de signs with long wire delays.