Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the issue logic
ICS '01 Proceedings of the 15th international conference on Supercomputing
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Scaling the issue window with look-ahead latency prediction
Proceedings of the 18th annual international conference on Supercomputing
Understanding Scheduling Replay Schemes
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Exploring Wakeup-Free Instruction Scheduling
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
POWER4 system microarchitecture
IBM Journal of Research and Development
Hi-index | 0.00 |
The role of the instruction scheduler is to supply instructions to functional units in a timely manner so as to avoid data and structural hazards. Current schedulers are based on the broadcast of result register numbers to all instructions waiting in the issue queue and on a global arbiter to select ready instructions from that queue. This approach called broadcast scheduling does not scale well due to its complexity. To reduce the complexity of the broadcast schedulers, data-flow pre-scheduling has been proposed. The basic idea is to predict the issue time of instructions based on the availability of operands and then time them down until they are ready to issue. However, resource conflicts for issue slots and functional units delay the issue time of conflicted instructions, and cause a large amount of replays. We propose to add instruction pre-selection to data-flow pre-schedulers for accurate instruction pre-scheduling . Our pre-scheduler keeps track of the allocation status of resources so that re source conflicts are eliminated. Pre-scheduled instructions are stored in an issue buffer until their issue delay elapses and then issue automatically. Our analysis shows that pre-schedulers with pre-selection result in performance improvements of 60% over current broadcast schedulers in pipeline designs where the scheduler is the bottleneck. In future technologies we expect this result to hold as logic intensive designs with short wires will be preferable to de signs with long wire delays.