SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency

Authors:
Deniz Balkan;Joseph Sharkey;Dmitry Ponomarev;Kanad Ghose
Affiliations:
State University of New York, Binghamton, NY;State University of New York, Binghamton, NY;State University of New York, Binghamton, NY;State University of New York, Binghamton, NY
Venue:
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Year:
2006

Citing 45
Cited 3

Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Register renaming and dynamic speculation: an alternative approach

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Exploiting dead value information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Recovery Mechanism for Latency Misprediction

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Characterizing and predicting value degree of use

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports using delayed write-back queues and operand pre-fetch

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware Schemes for Early Register Release

ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Banked multiported register files for high-frequency superscalar microprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints

Proceedings of the conference on Design, automation and test in Europe
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Reducing Datapath Energy through the Isolation of Short-Lived Operands

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Scalable Hardware Memory Disambiguation for High ILP Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Use-Based Register Caching with Decoupled Indexing

Proceedings of the 31st annual international symposium on Computer architecture
A Content Aware Integer Register File Organization

Proceedings of the 31st annual international symposium on Computer architecture
Physical Register Inlining

Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Increasing Processor Performance Through Early Register Release

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Small, Fast and Low-Power Register File by Bit-Partitioning

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Performance, Energy, and Thermal Considerations for SMT and CMP Architectures

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
RENO: A Rename-Based Instruction Optimizer

Proceedings of the 32nd annual international symposium on Computer Architecture
Understanding Scheduling Replay Schemes

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Compiler Directed Early Register Release

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Dynamically reducing pressure on the physical register file through simple register sharing

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software

An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors

Computers and Electrical Engineering
Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processors

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-performance microprocessors use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are never read from the register file and are not required to recover from branch mispredictions. In this paper, we propose SPARTAN - a set of micro-architectural extensions that predicts such transient values and in many cases completely avoids physical register allocations to them. We show that the transient values can be predicted as such with more than 97% accuracy on the average across simulated SPEC 2000 benchmarks. We evaluate the performance of SPARTAN on a variety of configurations and show that significant improvements in performance and energy-efficiency can be realized. Furthermore, we directly compare SPARTAN against a number of previously proposed schemes for register optimizations and show that our technique significantly outperforms all those schemes.