Design of the IBM Enterprise System/9000 high-end processor
IBM Journal of Research and Development
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
The energy complexity of register files
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
Reducing register ports for higher speed and lower energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Using SimPoint for accurate and efficient simulation
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Reducing register ports using delayed write-back queues and operand pre-fetch
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
A Content Aware Integer Register File Organization
Proceedings of the 31st annual international symposium on Computer architecture
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
Hi-index | 0.00 |
The majority of register file designs follow one of two well-known approaches.Manymodern high-performance processors (POWER4 [1], Pentium4 [2]) use a merged register file that holds both architectural and rename registers. Other processors use a Future File (eg, Opteron [3]) with rename registers kept separately in reservation stations. Both approaches have issues that may limit their application in futuremicroprocessors. The merged register file scales poorly in terms of powerperformance while the Future File has to pay a large penalty due on branch mis-prediction recovery. In addition, the Future File requires the use of the less scalable mechanism of reservation stations. This paper proposes to combine the best aspects of the traditional Future File architecture with those of the merged physical register file. The key point is that the new architecture separates the processor state, in particular the registers, and the execution units in the pipeline back-end. Therefore it is called Decoupled State-Execute Architecture. The resulting register file can be accessed in the pipeline front-end and has several desirable properties that allow efficient application of several optimizations, most notably the register file banking and a novel writeback filtering mechanism. As a result, only a 1.0% IPC degradation was observed with aggressive banking and the energy consumption was lowered by the new writeback filtering technique. Together, the two optimizations remove approximately 80% of the energy consumed in register file data array.