A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

  • Authors:
  • Shankar Ramaswamy;Sachin Sapatnekar;Prithviraj Banerjee

  • Affiliations:
  • Transarc Corp., Pittsburgh, PA;Univ. of Minnesota, Minneapolis;Northwestern Univ., Evanston, IL

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster. A practical implementation of a task and data parallel scheme of execution for an application on a distributed memory multicomputer also involves data redistribution. This data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and data parallelism together can be significantly faster than its execution using data parallelism alone. This makes our proposed optimization practical and extremely useful.