A Transformation Approach to Derive Efficient Parallel Implementations

Authors:
Thomas Rauber;Gudula Rünger
Affiliations:
Univ. Halle-Wittenberg, Halle (Saale), Germany;Univ. Leipzig, Leipzig, Germany
Venue:
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Year:
2000

Citing 44
Cited 12

Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
A bridging model for parallel computation

Communications of the ACM
Parallel iteration of high-order Runge-Kutta methods with stepsize control

Journal of Computational and Applied Mathematics
Architecture-Independent Parallel Computation

Computer
Iterated Runge-Kutta methods on parallel computers

SIAM Journal on Scientific and Statistical Computing
Performance modeling of distributed memory architectures

Journal of Parallel and Distributed Computing
A rapid hierarchical radiosity algorithm

Proceedings of the 18th annual conference on Computer graphics and interactive techniques
Approximate algorithms scheduling parallelizable tasks

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems

Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Models of machines and computation for mapping in multicomputers

ACM Computing Surveys (CSUR)
Scheduling parallelizable tasks to minimize average response time

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Fortran M: a language for modular parallel programming

Journal of Parallel and Distributed Computing
Accurate predictions of parallel program execution time

Journal of Parallel and Distributed Computing
Software libraries for linear algebra computations on high performance computers

SIAM Review
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Programming parallel algorithms

Communications of the ACM
Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing

IEEE Transactions on Parallel and Distributed Systems
Early prediction of MPP performance: the SP2, T3D, and Paragon experiences

Parallel Computing
Cache miss equations: an analytical representation of cache misses

ICS '97 Proceedings of the 11th international conference on Supercomputing
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Models and languages for parallel computation

ACM Computing Surveys (CSUR)
LocusRoute: a parallel global router for standard cells

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Compiler support for task scheduling in hierarchical execution models

Journal of Systems Architecture: the EUROMICRO Journal - Special issue on tools and environments for parallel program development
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Approaches for Integrating Task and Data Parallelism

IEEE Concurrency
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Parallel solution of a Schrödinger-Poisson system

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Optimal Data Distributions for LU Decomposition

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Functional Skeletons for Parallel Coordination

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
A Methodology for Deriving Parallel Programs with a Family of Parallel Abstract Machines

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Universal Computing

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Comparing Task and Data Parallel Execution Schemes for the DIIRK Method

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Automatic generation of efficient array redistribution routines for distributed memory multicomputers

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Modeling the Communication Behavior of the Intel Paragon

MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Deriving optimal data distributions for group parallel numerical algorithms

PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers

Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers
The Compiler TwoL for the Design of Parallel Implementations

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation

LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation
Simultaneous exploitation of task and data parallelism in regular scientific applications

Simultaneous exploitation of task and data parallelism in regular scientific applications
Algorithm + strategy = parallelism

Journal of Functional Programming

Library support for orthogonal processor groups

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
ORT: a communication library for orthogonal processor groups

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Orthogonal Processor Groups for Message-Passing Programs

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Selecting Data Distributions for Unbounded Loops

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Supporting tasks with adaptive groups in data parallel programming

International Journal of Computational Science and Engineering
Communicating Multiprocessor-Tasks

Languages and Compilers for Parallel Computing
Scalable computing with parallel tasks

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Modeling the energy consumption for concurrent executions of parallel tasks

Proceedings of the 14th Communications and Networking Symposium
Component-based programming techniques for coarse-grained parallelism

Proceedings of the 19th High Performance Computing Symposia
SEParAT: scheduling support environment for parallel application task graphs

Cluster Computing
A scheduling toolkit for multiprocessor-task programming with dependencies

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Combined scheduling and mapping for scalable computing with parallel tasks

Scientific Programming - Biological Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The construction of efficient parallel programs usually requires expert knowledge in the application area and a deep insight into the architecture of a specific parallel machine. Often, the resulting performance is not portable, i.e., a program that is efficient on one machine is not necessarily efficient on another machine with a different architecture. Transformation systems provide a more flexible solution. They start with a specification of the application problem and allow the generation of efficient programs for different parallel machines. The programmer has to give an exact specification of the algorithm expressing the inherent degree of parallelism and is released from the low-level details of the architecture. In this article, we propose such a transformation system with an emphasis on the exploitation of the data parallelism combined with a hierarchically organized structure of task parallelism. Starting with a specification of the maximum degree of task and data parallelism, the transformations generate a specification of a parallel program for a specific parallel machine. The transformations are based on a cost model and are applied in a predefined order, fixing the most important design decisions like the scheduling of independent multitask activations, data distributions, pipelining of tasks, and assignment of processors to task activations. We demonstrate the usefulness of the approach with examples from scientific computing.