A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

Authors:
Shankar Ramaswamy;Sachin Sapatnekar;Prithviraj Banerjee
Affiliations:
Transarc Corp., Pittsburgh, PA;Univ. of Minnesota, Minneapolis;Northwestern Univ., Evanston, IL
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1997

Citing 26
Cited 29

Numerical recipes in C: the art of scientific computing

Numerical recipes in C: the art of scientific computing
A fast static scheduling algorithm for DAGs on an unbounded number of processors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A heuristic of scheduling parallel tasks and its analysis

SIAM Journal on Computing
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Functional parallelism: theoretical foundations and implementation

Functional parallelism: theoretical foundations and implementation
Exploiting task and data parallelism on a multicomputer

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The high performance Fortran handbook

The high performance Fortran handbook
Automatic data partitioning on distributed memory multicomputers

Automatic data partitioning on distributed memory multicomputers
Compiling Fortran 90D/HPF for distributed memory MIMD computers

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Fortran M: a language for modular parallel programming

Journal of Parallel and Distributed Computing
Optimal mapping of sequences of data parallel tasks

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Compiler and run-time support for irregular computations

Compiler and run-time support for irregular computations
On the implementation and effectiveness of autoscheduling for shared-memory multiprocessors

On the implementation and effectiveness of autoscheduling for shared-memory multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors

Partitioning and Scheduling Parallel Programs for Multiprocessors
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Communication and memory requirements as the basis for mapping task and data parallel programs

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Task Parallelism in a High Performance Fortran Framework

IEEE Parallel & Distributed Technology: Systems & Technology
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Automatic Extraction of Functional Parallelism from Ordinary Programs

IEEE Transactions on Parallel and Distributed Systems
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Estimation of Communication Costs on Multicomputers

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
An Overview of a Compiler for Scalable Parallel Machines

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Automatic generation of efficient array redistribution routines for distributed memory multicomputers

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Simultaneous exploitation of task and data parallelism in regular scientific applications

Simultaneous exploitation of task and data parallelism in regular scientific applications
Elements of discrete mathematics (McGraw-Hill computer science series)

Elements of discrete mathematics (McGraw-Hill computer science series)

A coordination language for mixed task and and data parallel programs

Proceedings of the 1999 ACM symposium on Applied computing
Coordinating HPF programs to mix task and data parallelism

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 1
A Transformation Approach to Derive Efficient Parallel Implementations

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
ORT: a communication library for orthogonal processor groups

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Mixed data and task parallelism with HPF and PVM

Cluster Computing
Automatic Parallelization of Recursive Procedures

International Journal of Parallel Programming
A data and task parallel image processing environment

Parallel Computing - Parallel computing in image and video processing
CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Exploring Multi-level Parallelism in Cellular Automata Networks

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Exploiting Advanced Task Parallelism in High Performance Fortran via a Task Library

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Data and Task Parallel Image Processing Environment

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Task and data parallelism in P3L

Patterns and skeletons for parallel and distributed computing
Pattern Based Software Re-engineering: A Case Study

APSEC '99 Proceedings of the Sixth Asia Pacific Software Engineering Conference
A Data-Re-Distribution Library for Multi-Processor Task Programming

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
The design and implementation of LilyTask in shared memory

ACM SIGOPS Operating Systems Review
A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems

Journal of Parallel and Distributed Computing
An improved two-step algorithm for task and data parallel scheduling in distributed memory machines

Parallel Computing
Data parallel scheduling of operations in linear algebra on heterogeneous clusters

DIWEB'06 Proceedings of the 5th WSEAS International Conference on Distance Learning and Web Engineering
Scheduling mixed-parallel applications with advance reservations

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Distributed bucket processing: A paradigm embedded in a framework for the parallel processing of pixel sets

Parallel Computing
A fusion-based approach to digital movie restoration

Pattern Recognition
Scheduling mixed-parallel applications with advance reservations

Cluster Computing
User transparent task parallel multimedia content analysis

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
BTS: Resource capacity estimate for time-targeted science workflows

Journal of Parallel and Distributed Computing
Cost optimized provisioning of elastic resources for application workflows

Future Generation Computer Systems
SEParAT: scheduling support environment for parallel application task graphs

Cluster Computing
A scheduling toolkit for multiprocessor-task programming with dependencies

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Programming support and scheduling for communicating parallel tasks

Journal of Parallel and Distributed Computing
Combined scheduling and mapping for scalable computing with parallel tasks

Scientific Programming - Biological Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster. A practical implementation of a task and data parallel scheme of execution for an application on a distributed memory multicomputer also involves data redistribution. This data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and data parallelism together can be significantly faster than its execution using data parallelism alone. This makes our proposed optimization practical and extremely useful.