LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
TAM—a compiler controlled threaded abstract machine
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Evaluating the locality benefits of active messages
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Massively Parallel Multithreaded Architecture: DAVRID
ICCS '94 Proceedings of the1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors
Building Multithreaded Architectures with Off-the-Shelf Microprocessors
Proceedings of the 8th International Symposium on Parallel Processing
Analyzing the benefits of a separate processor to handle messages for fine-grain multithreading
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
I-Structures: Data Structures for Parallel Computing
I-Structures: Data Structures for Parallel Computing
The Implementation of a Threaded Abstract Machine
The Implementation of a Threaded Abstract Machine
Hi-index | 0.00 |
It becomes more and more interesting to construct multithreaded parallel machines using stock processors due to their high performance/price ratio. However, no quantitative analysis has been reported on the effectiveness of various node configurations and its impact on the overall performance. In this paper, we explore three different node configurations in detail and compare their dynamic characteristics through the instruction-level simulation with six benchmark programs. Our experiments show that employing a dedicated processor for communication and synchronization is a reasonable approach because it can almost double the performance. Several factors that limit the overall speedup are also presented.