Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Alpha implementations and architecture: complete reference and guide
Alpha implementations and architecture: complete reference and guide
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Optimization of high-performance superscalar architectures for energy efficiency
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
A framework for dynamic energy efficiency and temperature management
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Energy: efficient instruction dispatch buffer design for superscalar processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Positional adaptation of processors: application to energy reduction
Proceedings of the 30th annual international symposium on Computer architecture
AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors
Proceedings of the conference on Design, automation and test in Europe
Dynamic Thermal Management for High-Performance Microprocessors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Stochastic modeling of a thermally-managed multi-core system
Proceedings of the 45th annual Design Automation Conference
Lazy instruction scheduling: keeping performance, reducing power
Proceedings of the 13th international symposium on Low power electronics and design
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
A model to exploit power-performance efficiency in superscalar processors via structure resizing
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Microvisor: a runtime architecture for thermal management in chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
MLP-Aware instruction queue resizing: the key to power-efficient performance
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Hi-index | 14.98 |
The "one-size-fits-all” philosophy used for permanently allocating datapath resources in today's superscalar CPUs to maximize performance across a wide range of applications results in the overcommitment of resources in general. To reduce power dissipation in the datapath, the resource allocations can be dynamically adjusted based on the demands of applications. We propose a mechanism to dynamically, simultaneously, and independently adjust the sizes of the issue queue (IQ), the reorder buffer (ROB), and the load/store queue (LSQ) based on the periodic sampling of their occupancies to achieve significant power savings with minimal impact on performance. Resource upsizing is done more aggressively (compared to downsizing), using the relative rate of blocked dispatches to limit the performance penalty. Our results are validated by the execution of the SPEC 2000 benchmark suite on a substantially modified version of the Simplescalar simulator, where the IQ, the ROB, the LSQ, and the register files are implemented as separate structures, as is the case with most practical implementations. We also use actual VLSI layouts of the datapath components in a 0.18 micron process to accurately measure the energy dissipations for each type of access. For a 4-way superscalar CPU, an average power savings of about 42 percent within the IQ, 74 percent within the ROB (integrating the register file), and 41 percent within the LSQ can be achieved with an average performance penalty of about 5 percent.