Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting dead value information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports using delayed write-back queues and operand pre-fetch
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware Schemes for Early Register Release
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints
Proceedings of the conference on Design, automation and test in Europe
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Reducing Datapath Energy through the Isolation of Short-Lived Operands
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Energy Efficient Asymmetrically Ported Register Files
ICCD '03 Proceedings of the 21st International Conference on Computer Design
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A Content Aware Integer Register File Organization
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Increasing Processor Performance Through Early Register Release
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Small, Fast and Low-Power Register File by Bit-Partitioning
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Compiler Directed Early Register Release
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Bypass aware instruction scheduling for register file power reduction
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Selective writeback: exploiting transient values for energy-efficiency and performance
Proceedings of the 2006 international symposium on Low power electronics and design
Register file caching for energy efficiency
Proceedings of the 2006 international symposium on Low power electronics and design
Serialization-Aware Mini-Graphs: Performance with Fewer Resources
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A unified approach to eliminate memory accesses early
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.00 |
Much of the complexity in today's superscalar microprocessors stems from the need to maintain the speculatively produced results within the on-chip storage components until these results can be safely discarded without endangering the reconstruction of the precise state or impeding the recovery from possible branch misspeculations. For this, modern designs use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are delivered to their consumers via the bypass network (consumed "on-the-fly") and are never read out from the destination registers. In this paper, we first formulate conditions for identifying such transient values and describe their microarchitectural implementation; then we propose a technique to avoid the writeback of such transient values into the RF. With 64-entry integer and floating point register files, our technique achieves an 11% performance improvement and 29% reduction in the RF energy consumption compared to the baseline machine with the same number of registers. Furthermore, for the same performance target, the Selective Writeback scheme results in a 38% reduction in the energy consumption of the RF compared to the baseline machine.