The effect on RISC performance of register set size and structure versus code generation strategy
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Design of the IBM Enterprise System/9000 high-end processor
IBM Journal of Research and Development
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An out-of-order superscalar processor with speculative execution and fast, precise interrupts
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The PowerPC 604 RISC microprocessor
IEEE Micro
The IBM system/360 model 91: machine philosophy and instruction-handling
IBM Journal of Research and Development
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Exploiting dead value information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
IEEE Transactions on Computers
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Run-Time Support to Register Allocation for Loop Parallelization of Image Processing Programs
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Reducing register ports using delayed write-back queues and operand pre-fetch
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Reducing register pressure through LAER algorithm
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative early register release
Proceedings of the 3rd conference on Computing frontiers
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
Compacting register file via 2-level renaming and bit-partitioning
Microprocessors & Microsystems
Hardware support for early register release
International Journal of High Performance Computing and Networking
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Investigating the effects of fine-grain three-dimensional integration on microarchitecture design
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Efficient compilation for queue size constrained queue processors
Parallel Computing
Virtual registers: reducing register pressure without enlarging the register file
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
A configurable multi-ported register file architecture for soft processor cores
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture
Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Hi-index | 0.01 |
We have investigated the register file requirements of dynamically scheduled processors using register renaming and dispatch queues running the SPEC92 benchmarks. We looked at processors capable of issuing either four or eight instructions per cycle and found that in most cases implementing precise exceptions requires a relatively small number of additional registers compared to imprecise exceptions. Systems with aggressive non-blocking load support were able to achieve performance similar to processors with perfect memory systems at the cost of some additional registers. Given our machine assumptions, we found that the performance of a four-issue machine with a 32-entry dispatch queue tends to saturate around 80 registers. For an eight-issue machine with a 64-entry dispatch queue performance does not saturate until about 128 registers. Assuming the machine cycle time is proportional to the register file cycle time, the 8-issue machine yields only 20% higher performance than the 4-issue machine due in part to the cycle time impact of additional hardware.