Half-price architecture

Authors:
Ilhyun Kim;Mikko H. Lipasti
Affiliations:
University of Wisconsin-Madison;University of Wisconsin-Madison
Venue:
Proceedings of the 30th annual international symposium on Computer architecture
Year:
2003

Citing 15
Cited 27

Register file port requirements of transport triggered architectures

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Implementing optimizations at decode time

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A three dimensional register file for superscalar processors

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Dynamic Prediction of Critical Path Instructions

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Macro-op Scheduling: Relaxing Scheduling Loop Constraints

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An efficient wakeup design for energy reduction in high-performance superscalar processors

Proceedings of the 2nd conference on Computing frontiers
Static strands: safely collapsing dependence chains for increasing embedded power efficiency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Instruction packing: reducing power and delay of the dynamic scheduling logic

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A New Pointer-based Instruction Queue Design and Its Power-Performance Evaluation

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Power-Efficient Wakeup Tag Broadcast

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures

IEEE Transactions on Computers
Instruction packing: Toward fast and energy-efficient instruction scheduling

ACM Transactions on Architecture and Code Optimization (TACO)
SEED: scalable, efficient enforcement of dependences

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Energy-efficient dynamic instruction scheduling logic through instruction grouping

Proceedings of the 2006 international symposium on Low power electronics and design
A scalable low power issue queue for large instruction window processors

Proceedings of the 20th annual international conference on Supercomputing
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
Register port complexity reduction in wide-issue processors with selective instruction execution

Microprocessors & Microsystems
By-passing the out-of-order execution pipeline to increase energy-efficiency

Proceedings of the 4th international conference on Computing frontiers
Matrix scheduler reloaded

Proceedings of the 34th annual international symposium on Computer architecture
Static strands: Safely exposing dependence chains for increasing embedded power efficiency

ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality

IEEE Transactions on Computers
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
Process variation aware issue queue design

Proceedings of the conference on Design, automation and test in Europe
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution

Microprocessors & Microsystems
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture
Wake-up logic optimizations through selective match and wakeup range limitation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CRIB: consolidated rename, issue, and bypass

Proceedings of the 38th annual international symposium on Computer architecture
Instruction recirculation: eliminating counting logic in wakeup-free schedulers

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Reducing delay and power consumption of the wakeup logic through instruction packing and tag memoization

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Current-generation microprocessors are designed to process instructions with one and two source operands at equal cost. Handling two source operands requires multiple ports for each instruction in structures--such as the register file and wakeup logic--which are often in the processor's critical timing paths. We argue that these structures are overdesigned since only a small fraction of instructions require two source operands to be processed simultaneously. In this paper, we propose the half-price architecture that judiciously removes this overdesign by restricting the processor's capability to handle two source operands in certain timing-critical cases. Two techniques are proposed and evaluated: one for the wakeup logic is sequential wakeup, which decouples half of the tag matching logic from the wakeup bus to reduce the load capacitance of the bus. The other technique for the register file is sequential register access, which halves the register read ports by sequentially accessing two values using a single port when needed. We show that a pipeline that optimizes scheduling and register access for a single operand achieves nearly the same performance as an ideal base machine that fully handles two operands, with 2.2% (worst case 4.8%) IPC degradation.