Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor

  • Authors:
  • Xin-Min Tian;Shashank Nemawarkar;Guang R. Gao;Herbert Hum

  • Affiliations:
  • IBM Toronto Laboratory, 1150 Eglinton Ave. East, North York, Toronto, Ontario, Canada, M3C 1H7;School of Computer Science, Dept. of Electrical Eng., McGill University, Montréal, Canada, H3A 2A7;School of Computer Science, McGill University, Montréal, Canada, H3A 2A7;MAP-Oregon, Intel Corporation, 2111 NE 25th Ave., JF1-91, Portland, Oregon

  • Venue:
  • CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

The locality of the data in parallel programs is known to have a strong impact on the performance of distributed-memory multiprocessor systems. The worse the locality in access pattern, the worse the performance of single-threaded multiprocessor systems. The main reason is that a lower locality increases the latency for network messages, so a processor waiting for these messages idles for long periods. A good data-partitioning strategy strives to improve the locality of accesses by reducing the data sharing and the network traffic. A certain amount of data sharing, however, is a must for any non-trivial parallel program. So to tune the performance of multiprocessor systems, compilers and programmers expend significant effort to improve the data partitioning.The technique of multithreading has been promoted as an effective mechanism to hide inter-processor communication and remote data access latencies by quickly switching among a set of ready threads. In this paper, we show that multithreading also provides an immunity to the performance variations due to changes in data locality distributions in a distributed-memory multiprocessor. First, we propose two performance metrics to quantify the sensitivity of performance to the data locality. Second, we perform a quantitive comparison of data-locality sensitivity with both single-threaded and multithreaded computations underlying the designed experiments and benchmark programs. We perform these experiments on the 20-node EARTH-MANNA system. Our experimental results show that not only does a multithreaded computation yield a higher performance than does the single-threaded computation, but the performance is more robust with respect to the same data partitioning. That is, a lower data-locality sensitivity can be achieved with multithreading.