Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

Authors:
Jack L. Lo;Sujay S. Parekh;Susan J. Eggers;Henry M. Levy;Dean M. Tullisen
Affiliations:
Transmeta Corp., Santa Clara, CA;Univ. of Washington, Seattle;Univ. of Washington, Seattle;-;Univ. of California at San Diego, La Jolla
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1999

Citing 28
Cited 22

A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
The performance potential of multiple functional unit processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Register relocation: flexible contexts for multithreading

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Register connection: a new approach to adding registers into instruction set architectures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Facilitating superscalar processing via a combined static/dynamic register renaming scheme

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Partitioned register file for TTAs

Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Exploiting dead value information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
Portable Programs for Parallel Processors

Portable Programs for Parallel Processors
Scaling Parallel Programs for Multiprocessors: Methodology and Examples

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
A three dimensional register file for superscalar processors

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Named-State Register File: Implementation and Performance

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Register File Design Considerations in Dynamically Scheduled Processors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A Scalable Register File Architecture for Dynamically Scheduled Processors

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Storageless value prediction using prior register values

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
An analysis of operating system behavior on a simultaneous multithreaded architecture

ACM SIGPLAN Notices
Integrating superscalar processor components to implement register caching

ICS '01 Proceedings of the 15th international conference on Supercomputing
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Mini-Threads: Increasing TLP on Small-Scale SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Physical Register Inlining

Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Compiler Directed Early Register Release

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
How to Fake 1000 Registers

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
An Efficient Way of Passing of Data in a Multithreaded Scheduled Dataflow Architecture

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread

Journal of Parallel and Distributed Computing
Compacting register file via 2-level renaming and bit-partitioning

Microprocessors & Microsystems
Hardware support for early register release

International Journal of High Performance Computing and Networking
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the limits of early register release: Exploiting compiler analysis

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient register caching with compiler assistance

ACM Transactions on Architecture and Code Optimization (TACO)
Saving register-file static power by monitoring instruction sequence in ROB

Journal of Systems Architecture: the EUROMICRO Journal
2L-MuRR: a compact register renaming scheme for SMT processors

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Compiler directed issue queue energy reduction

Transactions on High-Performance Embedded Architectures and Compilers IV

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper proposes and evaluates software techniques that increase register file utilization for simultaneous multithreading (SMT) processors. SMT processors require large register files to hold multiple thread contexts that can issue instructions out of order every cycle. By supporting better interthread sharing and management of physical registers, an SMT processor can reduce the number of registers required and can improve performance for a given register file size. Our techniques specifically target register deallocation. While out-of-order processors with register renaming are effective at knowing when a new physical register must be allocated, they have limited knowledge of when physical registers can be deallocated. We propose architectural extensions that permit the compiler and operating system to: 1) free registers immediately upon their last use, and 2) free registers allocated to idle thread contexts. Our results, based on detailed instruction-level simulations of an SMT processor, show that these techniques can increase performance significantly for register-intensive, multithreaded programs.