Register relocation: flexible contexts for multithreading

Authors:
Carl A. Waldspurger;William E. Weihl
Affiliations:
-;-
Venue:
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Year:
1993

Citing 18
Cited 17

Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Performance evaluation of multiple register sets

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Toward a dataflow/von Neumann hybrid architecture

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MASA: a multithreaded processor architecture for parallel symbolic computing

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MIPS RISC architecture

MIPS RISC architecture
A processor architecture for horizon

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Can dataflow subsume von Neumann computing?

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Analysis of multithreaded architectures for parallel computing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The effect on RISC performance of register set size and structure versus code generation strategy

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Reduced instruction set computers

Communications of the ACM - Special section on computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
A Mechanism for Efficient Context Switching

ICCD '91 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR

PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR

Developing parallel applications using high-performance simulation

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Ordered multithreading: a novel technique for exploiting thread-level parallelism

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Inter-task register-allocation for static operating systems

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Named-State Register File: Implementation and Performance

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Mini-Threads: Increasing TLP on Small-Scale SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Efficient Real-Time Fine-Grained Concurrency on Low-Cost Microcontrollers

IEEE Micro
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balancing register pressure and context-switching delays in ASTI systems

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Achieving Efficiency and Portability in Systems Software: A Case Study on POSIX-Compliant Multithreaded Programs

IEEE Transactions on Software Engineering
How to Fake 1000 Registers

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software thread integration for embedded system display applications

ACM Transactions on Embedded Computing Systems (TECS)
Analysis of execution efficiency in the microthreaded processor UTLEON3

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Compiler support for lightweight context switching

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multithreading is an important technique that improves processor utilization by allowing computation to be overlapped with the long latency operations that commonly occur in multiprocessor systems. This paper presents register relocation, a new mechanism that efficiently supports flexible partitioning of the register file into variable-size contexts with minimal hardware support. Since the number of registers required by thread contexts varies, this flexibility permits a better utilization of scarce registers, allowing more contexts to be resident, which in turn allows applications to tolerate shorter run lengths and longer latencies. Our experiments show that compared to fixed-size hardware contexts, register relocation can improve processor utilization by a factor of two for many workloads.