A fill-unit approach to multiple instruction issue
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Exploiting instruction level parallelism in processors by caching scheduled groups
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 14th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Power reduction through work reuse
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing power with dynamic critical path information
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Recovery Mechanism for Latency Misprediction
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Improving quasi-dynamic schedules through region slip
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning
INTERACT '03 Proceedings of the Seventh Workshop on Interaction between Compilers and Computer Architectures
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting compiler-generated schedules for energy savings in high-performance processors
Proceedings of the 2003 international symposium on Low power electronics and design
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
Software Directed Issue Queue Power Reduction
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Low-Complexity Distributed Issue Queue
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
Impact of virtual execution environments on processor energy consumption and hardware adaptation
Proceedings of the 2nd international conference on Virtual execution environments
Hybrid-scheduling for reduced energy consumption in high-performance processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
In an out-of-order issue processor, instructions are dynamically reordered and issued to function units in their data-ready order rather than their original program order to achieve high performance. The logic that facilitates dynamic issue is one of the most power-hungry and time-critical components in a typical out-of-order issue processor.This paper develops a cooperative hardware/software technique to reduce complexity and energy consumption of the issue logic. The proposed scheme is based on the observation that not all instructions in a program require the same amount of dynamic reordering. Instructions that belong to basic blocks for which the compiler can perform near-optimal sche- duling do not need any intra-block instruction reordering but require only inter-block instruction overlap. In contrast, blocks where the compiler is limited by artificial dependences and memory misses require both intra-block and inter-block instruction reordering. The proposed Reorder-Sensitive Issue Scheme utilizes a novel compile-time analyzer to evaluate the quality of schedules generated by the static scheduler and to estimate the dynamic reordering requirement of instructions within each basic block. At the micro-architecture-level, we propose a novel issue queue that exploits the varying dynamic scheduling requirement of basic blocks to lower the power dissipation and complexity of the dynamic issue hardware.An evaluation of the technique on several SPEC integer benchmarks indicates that we can reduce the energy consumption in the issue queue on average by 72% with only 5% performance degradation Additionally, the proposed issue hardware is significantly less complex when compared to a conventional monolithic out-of-order issue queue, providing the potential for high clock speeds.