Numerical recipes in C: the art of scientific computing
Numerical recipes in C: the art of scientific computing
A fast static scheduling algorithm for DAGs on an unbounded number of processors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A heuristic of scheduling parallel tasks and its analysis
SIAM Journal on Computing
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Functional parallelism: theoretical foundations and implementation
Functional parallelism: theoretical foundations and implementation
Exploiting task and data parallelism on a multicomputer
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The high performance Fortran handbook
The high performance Fortran handbook
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
Compiling Fortran 90D/HPF for distributed memory MIMD computers
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Fortran M: a language for modular parallel programming
Journal of Parallel and Distributed Computing
Optimal mapping of sequences of data parallel tasks
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Compiler and run-time support for irregular computations
Compiler and run-time support for irregular computations
On the implementation and effectiveness of autoscheduling for shared-memory multiprocessors
On the implementation and effectiveness of autoscheduling for shared-memory multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Communication and memory requirements as the basis for mapping task and data parallel programs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Task Parallelism in a High Performance Fortran Framework
IEEE Parallel & Distributed Technology: Systems & Technology
Automatic Extraction of Functional Parallelism from Ordinary Programs
IEEE Transactions on Parallel and Distributed Systems
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Estimation of Communication Costs on Multicomputers
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
An Overview of a Compiler for Scalable Parallel Machines
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Simultaneous exploitation of task and data parallelism in regular scientific applications
Simultaneous exploitation of task and data parallelism in regular scientific applications
Elements of discrete mathematics (McGraw-Hill computer science series)
Elements of discrete mathematics (McGraw-Hill computer science series)
A coordination language for mixed task and and data parallel programs
Proceedings of the 1999 ACM symposium on Applied computing
Coordinating HPF programs to mix task and data parallelism
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 1
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
ORT: a communication library for orthogonal processor groups
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Mixed data and task parallelism with HPF and PVM
Cluster Computing
Automatic Parallelization of Recursive Procedures
International Journal of Parallel Programming
A data and task parallel image processing environment
Parallel Computing - Parallel computing in image and video processing
CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Exploring Multi-level Parallelism in Cellular Automata Networks
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Exploiting Advanced Task Parallelism in High Performance Fortran via a Task Library
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Data and Task Parallel Image Processing Environment
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Task and data parallelism in P3L
Patterns and skeletons for parallel and distributed computing
Pattern Based Software Re-engineering: A Case Study
APSEC '99 Proceedings of the Sixth Asia Pacific Software Engineering Conference
A Data-Re-Distribution Library for Multi-Processor Task Programming
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
The design and implementation of LilyTask in shared memory
ACM SIGOPS Operating Systems Review
A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems
Journal of Parallel and Distributed Computing
Data parallel scheduling of operations in linear algebra on heterogeneous clusters
DIWEB'06 Proceedings of the 5th WSEAS International Conference on Distance Learning and Web Engineering
Scheduling mixed-parallel applications with advance reservations
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
A fusion-based approach to digital movie restoration
Pattern Recognition
Scheduling mixed-parallel applications with advance reservations
Cluster Computing
User transparent task parallel multimedia content analysis
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
BTS: Resource capacity estimate for time-targeted science workflows
Journal of Parallel and Distributed Computing
Cost optimized provisioning of elastic resources for application workflows
Future Generation Computer Systems
A scheduling toolkit for multiprocessor-task programming with dependencies
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Programming support and scheduling for communicating parallel tasks
Journal of Parallel and Distributed Computing
Combined scheduling and mapping for scalable computing with parallel tasks
Scientific Programming - Biological Knowledge Discovery and Data Mining
Hi-index | 0.00 |
Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster. A practical implementation of a task and data parallel scheme of execution for an application on a distributed memory multicomputer also involves data redistribution. This data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and data parallelism together can be significantly faster than its execution using data parallelism alone. This makes our proposed optimization practical and extremely useful.