Guarded evaluation: pushing power management to logic synthesis/design
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Exploiting instruction level parallelism in processors by caching scheduled groups
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing power in high-performance microprocessors
DAC '98 Proceedings of the 35th annual Design Automation Conference
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
Evaluation of Design Options for the Trace Cache Fetch Mechanism
IEEE Transactions on Computers - Special issue on cache memory and related problems
MPS: Miss-Path Scheduling for Multiple-Issue Processors
IEEE Transactions on Computers
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
A static power model for architects
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Micro-operation cache: a power aware frontend for the variable instruction length ISA
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Power reduction through work reuse
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
DRG-cache: a data retention gated-ground cache for low power
Proceedings of the 39th annual Design Automation Conference
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Filtering Techniques to Improve Trace-Cache Efficiency
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Estimation of Maximum Power-up Current
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Leakage and leakage sensitivity computation for combinational circuits
Proceedings of the 2003 international symposium on Low power electronics and design
Power-efficient instruction delivery through trace reuse
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Low power microarchitecture with instruction reuse
Proceedings of the 5th conference on Computing frontiers
LPA: a first approach to the loop processor architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Reusing cached schedules in an out-of-order processor with in-order issue logic
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
This paper investigates a possible solution to the problem of power consumption in superscalar, out-of-order processors by proposing a new microarchitecture, specifically designed to reduce increasing power requirements of high-end processors. More precisely, we show that by modifying the well-established superscalar processor architecture, significant savings can be achieved in terms of power consumption. Our approach aims at limiting the growing amount of power used in a typical processor for dynamic optimizations (including out-of-order scheduling and register renaming). Our proposed approach achieves significant power savings by reusing as much as possible from the work done by the front-end of a typical superscalar, out-of-order pipeline, via the use of a special cache nested deeply into the processor structure. By reusing instructions that are already decoded, reordered, and have their registers already renamed, the front end of the pipeline can be turned off for large periods of time with significant savings in the overall power consumption. Experimental results show up to 35% (30% on average) savings in average energy per committed instruction, and 35% (20% on average) savings in energy-delay product, with about 9% average performance loss, over a large spectrum of SPEC95 and SPEC2000 benchmarks.