Evaluation of Various Node Configurations for Fine-grain Multithreading on Stock Processors

Authors:
Jin-Soo Kim;Soonhoi Ha;Chu Shik Jhon
Affiliations:
-;-;-
Venue:
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Year:
1997

Citing 8
Cited 0

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
TAM—a compiler controlled threaded abstract machine

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Evaluating the locality benefits of active messages

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Massively Parallel Multithreaded Architecture: DAVRID

ICCS '94 Proceedings of the1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors
Building Multithreaded Architectures with Off-the-Shelf Microprocessors

Proceedings of the 8th International Symposium on Parallel Processing
Analyzing the benefits of a separate processor to handle messages for fine-grain multithreading

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
I-Structures: Data Structures for Parallel Computing

I-Structures: Data Structures for Parallel Computing
The Implementation of a Threaded Abstract Machine

The Implementation of a Threaded Abstract Machine

Quantified Score

Hi-index	0.00

Visualization

Abstract

It becomes more and more interesting to construct multithreaded parallel machines using stock processors due to their high performance/price ratio. However, no quantitative analysis has been reported on the effectiveness of various node configurations and its impact on the overall performance. In this paper, we explore three different node configurations in detail and compare their dynamic characteristics through the instruction-level simulation with six benchmark programs. Our experiments show that employing a dedicated processor for communication and synchronization is a reasonable approach because it can almost double the performance. Several factors that limit the overall speedup are also presented.