Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Very low power pipelines using significance compression
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic zero compression for cache energy reduction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Energy: efficient instruction dispatch buffer design for superscalar processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Low-complexity reorder buffer architecture
ICS '02 Proceedings of the 16th international conference on Supercomputing
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An Adaptive Issue Queue for Reduced Power at High Performance
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors
Proceedings of the conference on Design, automation and test in Europe
Energy-efficient issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Complexity-Effective Reorder Buffer Designs for Superscalar Processors
IEEE Transactions on Computers
Power-aware compilation for register file energy reduction
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Saving register-file leakage power by monitoring instruction sequence in ROB
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Hi-index | 0.00 |
Some of today's superscalar processors, such as the Intel Pentium III, implement physical registers using the Reorder Buffer (ROB) slots. As much as 27% of the total CPU power is expended within the ROB in such designs, making the ROB a dominant source of power dissipation within the processor. This paper proposes three relatively independent techniques for the ROB power reduction with no or minimal impact on the performance. These techniques are: 1) dynamic ROB resizing; 2) the use of low-power comparators that dissipate energy mainly on a full match of the comparands and, 3) the use of zero-byte encoding. We validate our results by executing the complete suite of SPEC 95 benchmarks on a true cycle-by-cycle hardware-level simulator and using SPICE measurements for actual layouts of the ROB in 0.5 micron CMOS process. The total power savings achieved within the ROB using our approaches are in excess of 76% with the average performance penalty of less than 3%.