ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The impact of communication locality on large-scale multiprocessor performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A design study of the EARTH multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A study of the EARTH-MANNA multithreaded system
International Journal of Parallel Programming - Special issue on parallel architectures and compilation techniques—part II
Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Latency Hiding in Message-Passing Architectures
Proceedings of the 8th International Symposium on Parallel Processing
Costs and Benefits of Multithreading with Off-the-Shelf RISC Processors
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Building Multithreaded Architectures with Off-the-Shelf Microprocessors
Proceedings of the 8th International Symposium on Parallel Processing
MANNA: Prototype of a Distributed Memory Architecture with Maximized Sustained Performance
PDP '96 Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96)
Hi-index | 0.00 |
The locality of the data in parallel programs is known to have a strong impact on the performance of distributed-memory multiprocessor systems. The worse the locality in access pattern, the worse the performance of single-threaded multiprocessor systems. The main reason is that a lower locality increases the latency for network messages, so a processor waiting for these messages idles for long periods. A good data-partitioning strategy strives to improve the locality of accesses by reducing the data sharing and the network traffic. A certain amount of data sharing, however, is a must for any non-trivial parallel program. So to tune the performance of multiprocessor systems, compilers and programmers expend significant effort to improve the data partitioning.The technique of multithreading has been promoted as an effective mechanism to hide inter-processor communication and remote data access latencies by quickly switching among a set of ready threads. In this paper, we show that multithreading also provides an immunity to the performance variations due to changes in data locality distributions in a distributed-memory multiprocessor. First, we propose two performance metrics to quantify the sensitivity of performance to the data locality. Second, we perform a quantitive comparison of data-locality sensitivity with both single-threaded and multithreaded computations underlying the designed experiments and benchmark programs. We perform these experiments on the 20-node EARTH-MANNA system. Our experimental results show that not only does a multithreaded computation yield a higher performance than does the single-threaded computation, but the performance is more robust with respect to the same data partitioning. That is, a lower data-locality sensitivity can be achieved with multithreading.