Hierarchical registers for scientific computers
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Readings in computer architecture
The Alpha 21264 Microprocessor
IEEE Micro
Caching processor general registers
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Using Sacks to Organize Registers in VLIW Machines
CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Non-Consistent Dual Register Files to Reduce Register Pressure
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 14th international conference on Supercomputing
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing the complexity of the issue logic
ICS '01 Proceedings of the 15th international conference on Supercomputing
Integrating superscalar processor components to implement register caching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Measuring experimental error in microprocessor simulation
SSR '01 Proceedings of the 2001 symposium on Software reusability: putting software reuse in context
Power-aware partitioned cache architectures
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Errata on "Measuring Experimental Error in Microprocessor Simulation"
ACM SIGARCH Computer Architecture News
Low-complexity reorder buffer architecture
ICS '02 Proceedings of the 16th international conference on Supercomputing
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Partitioned instruction cache architecture for energy efficiency
ACM Transactions on Embedded Computing Systems (TECS)
Memory Architectures for Embedded Systems-On-Chip
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports using delayed write-back queues and operand pre-fetch
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Mini-Threads: Increasing TLP on Small-Scale SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Proceedings of the 30th annual international symposium on Computer architecture
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Reducing reorder buffer complexity through selective operand caching
Proceedings of the 2003 international symposium on Low power electronics and design
The microarchitecture of a low power register file
Proceedings of the 2003 international symposium on Low power electronics and design
Increasing the number of effective registers in a low-power processor using a windowed register file
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Reducing register pressure through LAER algorithm
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Complexity-Effective Reorder Buffer Designs for Superscalar Processors
IEEE Transactions on Computers
Isolating Short-Lived Operands for Energy Reduction
IEEE Transactions on Computers
Energy Efficient Comparators for Superscalar Datapaths
IEEE Transactions on Computers
Use-Based Register Caching with Decoupled Indexing
Proceedings of the 31st annual international symposium on Computer architecture
A Content Aware Integer Register File Organization
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Evaluation of Bus Based Interconnect Mechanisms in Clustered VLIW Architectures
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Control-Flow Independence Reuse via Dynamic Vectorization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A Speculative Control Scheme for an Energy-Efficient Banked Register File
IEEE Transactions on Computers
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor
IEEE Transactions on Computers
An asymmetric clustered processor based on value content
Proceedings of the 19th annual international conference on Supercomputing
Compiler Directed Early Register Release
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative execution for hiding memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Throttling-Based Resource Management in High Performance Multithreaded Architectures
IEEE Transactions on Computers
Early Register Deallocation Mechanisms Using Checkpointed Register Files
IEEE Transactions on Computers
Selective writeback: exploiting transient values for energy-efficiency and performance
Proceedings of the 2006 international symposium on Low power electronics and design
Register file caching for energy efficiency
Proceedings of the 2006 international symposium on Low power electronics and design
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
Using fine grain multithreading for energy efficient computing
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compacting register file via 2-level renaming and bit-partitioning
Microprocessors & Microsystems
A cache design for high performance embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
Integration, the VLSI Journal
Building a large instruction window through ROB compression
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Exploiting virtual registers to reduce pressure on real registers
ACM Transactions on Architecture and Code Optimization (TACO)
IEEE Transactions on Computers
Hardware support for early register release
International Journal of High Performance Computing and Networking
International Journal of High Performance Computing and Networking
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Selective writeback: reducing register file pressure and energy consumption
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures
International Journal of Parallel Programming
Exploring the limits of early register release: Exploiting compiler analysis
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Virtual registers: reducing register pressure without enlarging the register file
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
A configurable multi-ported register file architecture for soft processor cores
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Decoupled state-execute architecture
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Proceedings of the 7th ACM international conference on Computing frontiers
Exploiting dynamic reconfiguration techniques: the 2D-VLIW approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Register Cache System Not for Latency Reduction Purpose
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Saving register-file static power by monitoring instruction sequence in ROB
Journal of Systems Architecture: the EUROMICRO Journal
CROB: implementing a large instruction window through compression
Transactions on high-performance embedded architectures and compilers III
Energy-efficient mechanisms for managing thread context in throughput processors
Proceedings of the 38th annual international symposium on Computer architecture
2L-MuRR: a compact register renaming scheme for SMT processors
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A register allocation framework for banked register files with access constraints
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
An optimized front-end physical register file with banking and writeback filtering
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
Shared-port register file architecture for low-energy VLIW processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.03 |
The register file access time is one of the critical delays in current superscalar processors. Its impact on processor performance is likely to increase in future processor generations, as they are expected to increase the issue width (which implies more register ports) and the size of the instruction window (which implies more registers), and to use some kind of multithreading. Under this scenario, the register file access time could be a dominant delay and a pipelined implementation would be desirable to allow for high clock rates.However, a multi-stage register file has severe implications for processor performance (e.g. higher branch misprediction penalty) and complexity (more levels of bypass logic). To tackle these two problems, in this paper we propose a register file architecture composed of multiple banks. In particular we focus on a multi-level organization of the register file, which provides low latency and simple bypass logic. We propose several caching policies and prefetching strategies and demonstrate the potential of this multiple-banked organization. For instance, we show that a two-level organization degrades IPC by 10% and 2% with respect to a non-pipelined single-banked register file, for SpecInt95 and SpecFP95 respectively, but it increases performance by 87% and 92% when the register file access time is factored in.