A Fine-Grain Multithreading Superscalar Architecture

Authors:
Mat Loikkanen;Nader Bagherzadeh
Affiliations:
-;-
Venue:
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Year:
1996

Citing 0
Cited 9

Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Multithreaded Parallel Computer Model with Performance Evaluation

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Study of a Simultaneous Multithreaded Processor Implementation

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Architecture optimization for multimedia application exploiting data and thread-level parallelism

Journal of Systems Architecture: the EUROMICRO Journal
Improving SMT performance scheduling processes

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study we show that fine-grain multithreading is an effective way to increase instruction-level parallelism and hide the latencies of long-latency operations in a superscalar processor. The effects of long-latency operations, such as remote memory references, cache-misses, and multi-cycle floating-point calculations, are detrimental to performance since such operations typically cause a stall. Even superscalar processors, that are capable of performing various operations in parallel, are vulnerable. A fine-grain multithreading paradigm and unique multithreaded superscalar architecture is presented. Simulation results show significant speedup over single-threaded superscalar execution.