Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 14th international conference on Supercomputing
Circuits for wide-window superscalar processors
Proceedings of the 27th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Energy-efficient hybrid wakeup logic
Proceedings of the 2002 international symposium on Low power electronics and design
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Increasing design space of the instruction queue with tag coding
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
An efficient wakeup design for energy reduction in high-performance superscalar processors
Proceedings of the 2nd conference on Computing frontiers
A New Pointer-based Instruction Queue Design and Its Power-Performance Evaluation
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
SEED: scalable, efficient enforcement of dependences
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A scalable low power issue queue for large instruction window processors
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 34th annual international symposium on Computer architecture
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality
IEEE Transactions on Computers
A partitioned instruction queue to reduce instruction wakeup energy
International Journal of High Performance Computing and Networking
A distributed processor state management architecture for large-window processors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
An energy-efficient checkpointing mechanism for out of order commit processor
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Design and optimization of the store vectors memory dependence predictor
ACM Transactions on Architecture and Code Optimization (TACO)
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Wake-up logic optimizations through selective match and wakeup range limitation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Low complexity out-of-order issue logic using static circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
The wakeup logic is a part of the issuing window and is responsible to manage the ready flags of the operands for dynamic instruction scheduling. The conventional wakeup logic is based on association, and composed of a RAM and a CAM. Since the logic is not pipelinable and the delays of these memories are dominated by the wire delays, the logic will be more critical with deeper pipelines and smaller feature sizes. This paper describes a new scheduling scheme not based on the association but on matrices which represent the dependences between instructions. Since the update logic of the matrices detects the dependencies between instructions as the register renaming logic does, the wakeup operation is realized by just reading the matrices. This paper also describes a technique to reduce the effective size of the matrices for small IPC penalties. We designed the layouts of the logics guided by a 0.18µm CMOS design rule provided by Fujitsu Limited, and calculated the delays. We also evaluated the penalties by cycle-level simulation. The results show that our scheme achieves 2.7GHz clock speed for the IPC degradation of about 1%.