Register File Design Considerations in Dynamically Scheduled Processors

Authors:
Keith I. Farkas;Paul Chow;Norman P. Jouppi
Affiliations:
-;-;-
Venue:
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Year:
1996

Citing 10
Cited 29

The effect on RISC performance of register set size and structure versus code generation strategy

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Design of the IBM Enterprise System/9000 high-end processor

IBM Journal of Research and Development
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An out-of-order superscalar processor with speculative execution and fast, precise interrupts

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The PowerPC 604 RISC microprocessor

IEEE Micro
Superscalar Instruction Execution in the 21164 Alpha Microprocessor

IEEE Micro
The IBM system/360 model 91: machine philosophy and instruction-handling

IBM Journal of Research and Development

The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Exploiting dead value information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions

IEEE Transactions on Computers
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Run-Time Support to Register Allocation for Loop Parallelization of Image Processing Programs

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Reducing register ports using delayed write-back queues and operand pre-fetch

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Reducing register pressure through LAER algorithm

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Physical Register Inlining

Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative early register release

Proceedings of the 3rd conference on Computing frontiers
Register port complexity reduction in wide-issue processors with selective instruction execution

Microprocessors & Microsystems
Compacting register file via 2-level renaming and bit-partitioning

Microprocessors & Microsystems
Hardware support for early register release

International Journal of High Performance Computing and Networking
Achieving Out-of-Order Performance with Almost In-Order Complexity

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Investigating the effects of fine-grain three-dimensional integration on microarchitecture design

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Efficient compilation for queue size constrained queue processors

Parallel Computing
Virtual registers: reducing register pressure without enlarging the register file

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
A configurable multi-ported register file architecture for soft processor cores

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
CRAM: coded registers for amplified multiporting

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We have investigated the register file requirements of dynamically scheduled processors using register renaming and dispatch queues running the SPEC92 benchmarks. We looked at processors capable of issuing either four or eight instructions per cycle and found that in most cases implementing precise exceptions requires a relatively small number of additional registers compared to imprecise exceptions. Systems with aggressive non-blocking load support were able to achieve performance similar to processors with perfect memory systems at the cost of some additional registers. Given our machine assumptions, we found that the performance of a four-issue machine with a 32-entry dispatch queue tends to saturate around 80 registers. For an eight-issue machine with a 64-entry dispatch queue performance does not saturate until about 128 registers. Assuming the machine cycle time is proportional to the register file cycle time, the 8-issue machine yields only 20% higher performance than the 4-issue machine due in part to the cycle time impact of additional hardware.