Reducing register pressure in SMT processors through L2-miss-driven early register release

Authors:
Joseph J. Sharkey;Jason Loew;Dmitry V. Ponomarev
Affiliations:
Assured Information Security Inc., Rome, NY;State University of New York, Binghamton, Binghamton, NY;State University of New York, Binghamton, Binghamton, NY
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2008

Citing 25
Cited 2

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Low-complexity reorder buffer architecture

ICS '02 Proceedings of the 16th international conference on Supercomputing
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Hardware Schemes for Early Register Release

ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Physical Register Inlining

Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Increasing Processor Performance Through Early Register Release

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Checkpointed Early Load Retirement

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
How to Fake 1000 Registers

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Proceedings of the 33rd annual international symposium on Computer Architecture
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing

Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors

Computers and Electrical Engineering
Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processors

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The register file is one of the most critical datapath components limiting the number of threads that can be supported on a simultaneous multithreading (SMT) processor. To allow the use of smaller register files without degrading performance, techniques that maximize the efficiency of using registers through aggressive register allocation/deallocation can be considered. In this article, we propose a novel technique to early deallocate physical registers allocated to threads which experience L2 cache misses. This is accomplished by speculatively committing the load-independent instructions and deallocating the registers corresponding to the previous mappings of their destinations, without waiting for the cache miss request to be serviced. The early deallocated registers are then made immediately available for allocation to instructions within the same thread as well as within other threads, thus improving the overall processor throughput. On the average across the simulated mixes of multiprogrammed SPEC 2000 workloads, our technique results in 33% improvement in throughput and 25% improvement in terms of harmonic mean of weighted IPCs over the baseline SMT with the state-of-the-art DCRA policy. This is achieved without creating checkpoints, maintaining per-register counters of pending consumers, performing tag rebroadcasts, register remappings, and/or additional associative searches.