Inherently Lower-Power High-Performance Superscalar Architectures

Authors:
Victor V. Zyuban;Peter M. Kogge
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;Univ. of Notre Dame, South Bend, IN
Venue:
IEEE Transactions on Computers
Year:
2001

Citing 20
Cited 38

A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiscalar architecture

The multiscalar architecture
Performance evaluation of the PowerPC 620 microarchitecture

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The design of a high performance low power microprocessor

ISLPED '96 Proceedings of the 1996 international symposium on Low power electronics and design
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The energy complexity of register files

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Application of STD to latch-power estimation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Correlated load-address predictors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Optimization of high-performance superscalar architectures for energy efficiency

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
One Billion Transistors, One Uniprocessor, One Chip

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The HP PA-8000 RISC CPU

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
A three dimensional register file for superscalar processors

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Inherently lower-power high-performance superscalar architectures

Inherently lower-power high-performance superscalar architectures

Unified architecture level energy-efficiency metric

Proceedings of the 12th ACM Great Lakes symposium on VLSI
Hardware and Software Techniques for Controlling DRAM Power Modes

IEEE Transactions on Computers
A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Holistic Approach to System Level Energy Optimization

PATMOS '00 Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation
Banked multiported register files for high-frequency superscalar microprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Improving dynamic cluster assignment for clustered trace cache processors

Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting compiler-generated schedules for energy savings in high-performance processors

Proceedings of the 2003 international symposium on Low power electronics and design
Access Pattern Restructuring for Memory Energy

IEEE Transactions on Parallel and Distributed Systems
Combining compiler and runtime IPC predictions to reduce energy in next generation architectures

Proceedings of the 1st conference on Computing frontiers
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures

Proceedings of the 18th annual international conference on Supercomputing
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
Scaling into Ambient Intelligence

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Dynamic Task-Level Voltage Scheduling Optimizations

IEEE Transactions on Computers
A Dependency Chain Clustered Microarchitecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

IEEE Transactions on Computers
A Speculative Control Scheme for an Energy-Efficient Banked Register File

IEEE Transactions on Computers
Store Buffer Design in First-Level Multibanked Data Caches

Proceedings of the 32nd annual international symposium on Computer Architecture
Understanding the energy efficiency of SMT and CMP with multiclustering

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Low-power, low-complexity instruction issue using compiler assistance

Proceedings of the 19th annual international conference on Supercomputing
Memory Bank Predictors

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Hybrid-scheduling for reduced energy consumption in high-performance processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Joint hardware-software leakage minimization approach for the register file of VLIW embedded architectures

Integration, the VLSI Journal
Exploiting virtual registers to reduce pressure on real registers

ACM Transactions on Architecture and Code Optimization (TACO)
Hardware support for early register release

International Journal of High Performance Computing and Networking
Addressing thermal nonuniformity in SMT workloads

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-aware register file re-partitioning for clustered VLIW architectures

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
An energy-efficient instruction scheduler design with two-level shelving and adaptive banking

Journal of Computer Science and Technology
Multicore architectures with dynamically reconfigurable array processors for wireless broadband technologies

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Virtual registers: reducing register pressure without enlarging the register file

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
On the latency and energy of checkpointed superscalar register alias tables

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Applied inference: Case studies in microarchitectural design

ACM Transactions on Architecture and Code Optimization (TACO)
CoreSymphony: an efficient reconfigurable multi-core architecture

ACM SIGARCH Computer Architecture News
Low power microprocessor design for embedded systems

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
CRAM: coded registers for amplified multiporting

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Compiler-driven leakage energy reduction in banked register files

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	15.00

Visualization

Abstract

In recent years, reducing power has become an important design goal for high-performance microprocessors. This work attempts to bring the power issue to the earliest phases of microprocessor development, in particular, the stage of defining a chip microarchitecture. We investigate power-optimization techniques of superscalar microprocessors at the microarchitecture level that do not compromise performance. First, major targets for power reduction are identified within microarchitecture, where power is heavily consumed or will be heavily consumed in next-generation superscalar processors. Then, a new, energy-efficient version of a multicluster microarchitecture is developed that reduces energy in the identified critical design points with minimal performance impact. A methodology is developed for energy-performance optimization at the microarchitecture level that generates, for a microarchitecture, a set of energy-efficient configurations, forming a convex hull in the power-performance space. Detailed simulation of the baseline and proposed multicluster architectures has been performed using the developed optimization methodology. A comparison of the two microarchitectures, both optimized for energy efficiency, shows that the multicluster architecture is potentially up to twice as energy efficient for wide issue processors, with an advantage that grows with the issue width. Conversely, at the same power dissipation level, the multicluster architecture supports configurations with measurably higher performance than equivalent conventional designs.