Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
A three-port nRERL register file for ultra-low-energy applications
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Breathing life into a paper tiger (keynote session)
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Energy-Efficient Register Access
SBCCI '00 Proceedings of the 13th symposium on Integrated circuits and systems design
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Reducing register ports using delayed write-back queues and operand pre-fetch
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Proceedings of the 30th annual international symposium on Computer architecture
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Reducing reorder buffer complexity through selective operand caching
Proceedings of the 2003 international symposium on Low power electronics and design
The microarchitecture of a low power register file
Proceedings of the 2003 international symposium on Low power electronics and design
Reducing Design Complexity of the Load/Store Queue
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Leakage Energy Reduction in Register Renaming
ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Complexity-Effective Reorder Buffer Designs for Superscalar Processors
IEEE Transactions on Computers
Isolating Short-Lived Operands for Energy Reduction
IEEE Transactions on Computers
A Content Aware Integer Register File Organization
Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Speculative Control Scheme for an Energy-Efficient Banked Register File
IEEE Transactions on Computers
Static strands: safely collapsing dependence chains for increasing embedded power efficiency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiler Directed Early Register Release
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Reducing the Energy of Speculative Instruction Schedulers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Bypass aware instruction scheduling for register file power reduction
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Selective writeback: exploiting transient values for energy-efficiency and performance
Proceedings of the 2006 international symposium on Low power electronics and design
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
Static strands: Safely exposing dependence chains for increasing embedded power efficiency
ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
Integration, the VLSI Journal
IEEE Transactions on Computers
Defect Analysis and Defect Tolerant Design of Multi-port SRAMs
Journal of Electronic Testing: Theory and Applications
International Journal of High Performance Computing and Networking
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Selective writeback: reducing register file pressure and energy consumption
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A distributed processor state management architecture for large-window processors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Exploring the limits of early register release: Exploiting compiler analysis
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient renaming with register versioning
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Decoupled state-execute architecture
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Exploiting narrow-width values for thermal-aware register file designs
Proceedings of the Conference on Design, Automation and Test in Europe
Performance evaluation of superscalar processor with multi-bank register file using SPEC2000
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A compile-time managed multi-level register file hierarchy
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
An optimized front-end physical register file with banking and writeback filtering
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Compiler-driven leakage energy reduction in banked register files
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Shared-port register file architecture for low-energy VLIW processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
The key issues for register file design in high-performance processors are access time and energy. While previous work has focused on reducing the number of registers, we propose to reduce the number of register ports through two proposals, one for reads and the other for writes. For reads, we propose bypass hint to reduce register port requirements by avoiding unnecessary register file reads for cases where values are bypassed. Current processors are unable to avoid these unnecessary reads due to timing constraints. For writes, we use register file banking. Current banking schemes assign different banks to instructions that are renamed together, which does not necessarily avoid conflicts among instructions that writeback together. We use decoupled rename, a technique which separates dependence and physical tagging of register operands. Decoupled rename allows us to perform physical register allocation just before writeback, avoiding bank conflicts. Our results show that combining bypass hint and write banking, our 1-cycle register file with 6 read ports, and two 4-write-ported banks achieves a 9% processor energy-delay savings over a system using a perfectly-pipelined, 2-cycle register file with 16 read ports and 8 write ports.