ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The energy complexity of register files
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Delaying physical register allocation through virtual-physical registers
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Reducing register ports for higher speed and lower energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Network coding: an instant primer
ACM SIGCOMM Computer Communication Review
The Danger of Interval-Based Power Efficiency Metrics: When Worst Is Best
IEEE Computer Architecture Letters
High-Performance Embedded Computing: Architectures, Applications, and Methodologies
High-Performance Embedded Computing: Architectures, Applications, and Methodologies
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
Proceedings of the 34th annual international symposium on Computer architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 38th annual international symposium on Computer architecture
Hi-index | 0.00 |
Modern out-of-order processors require a large number of register file access ports. However, adding more ports can drastically increase the delay, power and area of the register file. This relationship imposes constraints on existing superscalar designs while impeding implementation of faster and wider out-of-order processors. In this paper, we present a novel multi-ported register file using concepts from network coding. We split a true multi-ported register file into two interleaved banks, each having half the read and write ports. A third bank, storing the XOR of the write backs to the other two banks, is added to amplify the read and write bandwidth. When compared to a conventional register file, our 8R4W 128-entry coded CRAM register file reduces leakage power by 48%, area by 29% and delay by 9%. In addition, for SPEC2006 benchmarks, our implementation consumes 40% less register file dynamic energy on average with IPC degradation of 3%.