Design Alternatives of Multithreaded Architecture

Authors:
Avi Mendelson;Michael Bekerman
Affiliations:
-;-
Venue:
International Journal of Parallel Programming
Year:
1999

Citing 25
Cited 0

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
A variable instruction stream extension to the VLIW architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Using profile information to assist classic code optimizations

Software—Practice & Experience
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
MISC: a Multiple Instruction Stream Computer

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A study of the EARTH-MANNA multithreaded system

International Journal of Parallel Programming - Special issue on parallel architectures and compilation techniques—part II
Multithreading with Distributed Functional Units

IEEE Transactions on Computers
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Superblock formation using static program analysis

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
The IA-64 Architecture at Work

Computer
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
Data Flow and Dependence Analysis for Instruction Level Parallelism

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Toward a General-Purpose Multi-Stream System

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Performance and Hardware Complexity Tradeoffs in Designing Multithreaded Architectures

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Classification and Performance Evaluation of Simultaneous Multithreaded Architectures

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper compares two possible implementations of multithreaded architecture and proposes a new architecture combining the flexibility of the first with the low hardware complexity of the second. We present performance and step-by-step complexity analysis of two design alternatives of multithreaded architecture: dynamic inter-thread resource scheduling and static resource allocation. We then introduce a new multithreaded architecture based on a new scheduling mechanism called the “semi-static.” We show that with two concurrent threads the dynamic scheduling processor achieves from 5 to 45 % higher performance at the cost of much more complicated design. This paper indicates that for a relatively high number of execution resources the complexity of the dynamic scheduling logic will inevitably require design compromises. Moreover, high chip-wide communication time and an incomplete bypassing network will limit the dynamic scheduling and reduce its performance advantage. On the other hand, static scheduling architecture achieves low resource utilization. The semi-static architecture utilizes compiler techniques to exploit patterns of program parallelism and introduces a new hardware mechanism, in order to achieve performance close to dynamic scheduling without significantly increasing the static hardware complexity. The semi-static architecture statically assigns part of the functional units but dynamically schedules the most performance-critical functional units on a medium-grain basis.