A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiscalar architecture
Performance evaluation of the PowerPC 620 microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The design of a high performance low power microprocessor
ISLPED '96 Proceedings of the 1996 international symposium on Low power electronics and design
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The energy complexity of register files
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Application of STD to latch-power estimation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Correlated load-address predictors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Optimization of high-performance superscalar architectures for energy efficiency
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
A three dimensional register file for superscalar processors
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Inherently lower-power high-performance superscalar architectures
Inherently lower-power high-performance superscalar architectures
Unified architecture level energy-efficiency metric
Proceedings of the 12th ACM Great Lakes symposium on VLSI
Hardware and Software Techniques for Controlling DRAM Power Modes
IEEE Transactions on Computers
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Holistic Approach to System Level Energy Optimization
PATMOS '00 Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Improving dynamic cluster assignment for clustered trace cache processors
Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting compiler-generated schedules for energy savings in high-performance processors
Proceedings of the 2003 international symposium on Low power electronics and design
Access Pattern Restructuring for Memory Energy
IEEE Transactions on Parallel and Distributed Systems
Combining compiler and runtime IPC predictions to reduce energy in next generation architectures
Proceedings of the 1st conference on Computing frontiers
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures
Proceedings of the 18th annual international conference on Supercomputing
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
Scaling into Ambient Intelligence
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Dynamic Task-Level Voltage Scheduling Optimizations
IEEE Transactions on Computers
A Dependency Chain Clustered Microarchitecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
A Speculative Control Scheme for an Energy-Efficient Banked Register File
IEEE Transactions on Computers
Store Buffer Design in First-Level Multibanked Data Caches
Proceedings of the 32nd annual international symposium on Computer Architecture
Understanding the energy efficiency of SMT and CMP with multiclustering
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Low-power, low-complexity instruction issue using compiler assistance
Proceedings of the 19th annual international conference on Supercomputing
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Hybrid-scheduling for reduced energy consumption in high-performance processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Integration, the VLSI Journal
Exploiting virtual registers to reduce pressure on real registers
ACM Transactions on Architecture and Code Optimization (TACO)
Hardware support for early register release
International Journal of High Performance Computing and Networking
Addressing thermal nonuniformity in SMT workloads
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-aware register file re-partitioning for clustered VLIW architectures
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
An energy-efficient instruction scheduler design with two-level shelving and adaptive banking
Journal of Computer Science and Technology
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Virtual registers: reducing register pressure without enlarging the register file
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
On the latency and energy of checkpointed superscalar register alias tables
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Applied inference: Case studies in microarchitectural design
ACM Transactions on Architecture and Code Optimization (TACO)
CoreSymphony: an efficient reconfigurable multi-core architecture
ACM SIGARCH Computer Architecture News
Low power microprocessor design for embedded systems
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Compiler-driven leakage energy reduction in banked register files
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 15.00 |
In recent years, reducing power has become an important design goal for high-performance microprocessors. This work attempts to bring the power issue to the earliest phases of microprocessor development, in particular, the stage of defining a chip microarchitecture. We investigate power-optimization techniques of superscalar microprocessors at the microarchitecture level that do not compromise performance. First, major targets for power reduction are identified within microarchitecture, where power is heavily consumed or will be heavily consumed in next-generation superscalar processors. Then, a new, energy-efficient version of a multicluster microarchitecture is developed that reduces energy in the identified critical design points with minimal performance impact. A methodology is developed for energy-performance optimization at the microarchitecture level that generates, for a microarchitecture, a set of energy-efficient configurations, forming a convex hull in the power-performance space. Detailed simulation of the baseline and proposed multicluster architectures has been performed using the developed optimization methodology. A comparison of the two microarchitectures, both optimized for energy efficiency, shows that the multicluster architecture is potentially up to twice as energy efficient for wide issue processors, with an advantage that grows with the issue width. Conversely, at the same power dissipation level, the multicluster architecture supports configurations with measurably higher performance than equivalent conventional designs.