Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
A bridging model for parallel computation
Communications of the ACM
Parallel iteration of high-order Runge-Kutta methods with stepsize control
Journal of Computational and Applied Mathematics
Iterated Runge-Kutta methods on parallel computers
SIAM Journal on Scientific and Statistical Computing
Performance modeling of distributed memory architectures
Journal of Parallel and Distributed Computing
A rapid hierarchical radiosity algorithm
Proceedings of the 18th annual conference on Computer graphics and interactive techniques
Approximate algorithms scheduling parallelizable tasks
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems
Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Models of machines and computation for mapping in multicomputers
ACM Computing Surveys (CSUR)
Scheduling parallelizable tasks to minimize average response time
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Fortran M: a language for modular parallel programming
Journal of Parallel and Distributed Computing
Accurate predictions of parallel program execution time
Journal of Parallel and Distributed Computing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Programming parallel algorithms
Communications of the ACM
Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing
IEEE Transactions on Parallel and Distributed Systems
Early prediction of MPP performance: the SP2, T3D, and Paragon experiences
Parallel Computing
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Models and languages for parallel computation
ACM Computing Surveys (CSUR)
LocusRoute: a parallel global router for standard cells
DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Compiler support for task scheduling in hierarchical execution models
Journal of Systems Architecture: the EUROMICRO Journal - Special issue on tools and environments for parallel program development
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Approaches for Integrating Task and Data Parallelism
IEEE Concurrency
Parallel solution of a Schrödinger-Poisson system
HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Optimal Data Distributions for LU Decomposition
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Functional Skeletons for Parallel Coordination
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
A Methodology for Deriving Parallel Programs with a Family of Parallel Abstract Machines
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Comparing Task and Data Parallel Execution Schemes for the DIIRK Method
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Modeling the Communication Behavior of the Intel Paragon
MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Deriving optimal data distributions for group parallel numerical algorithms
PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers
Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers
The Compiler TwoL for the Design of Parallel Implementations
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation
Simultaneous exploitation of task and data parallelism in regular scientific applications
Simultaneous exploitation of task and data parallelism in regular scientific applications
Algorithm + strategy = parallelism
Journal of Functional Programming
Library support for orthogonal processor groups
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
ORT: a communication library for orthogonal processor groups
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Orthogonal Processor Groups for Message-Passing Programs
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Selecting Data Distributions for Unbounded Loops
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Supporting tasks with adaptive groups in data parallel programming
International Journal of Computational Science and Engineering
Communicating Multiprocessor-Tasks
Languages and Compilers for Parallel Computing
Scalable computing with parallel tasks
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Modeling the energy consumption for concurrent executions of parallel tasks
Proceedings of the 14th Communications and Networking Symposium
Component-based programming techniques for coarse-grained parallelism
Proceedings of the 19th High Performance Computing Symposia
A scheduling toolkit for multiprocessor-task programming with dependencies
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Combined scheduling and mapping for scalable computing with parallel tasks
Scientific Programming - Biological Knowledge Discovery and Data Mining
Hi-index | 0.00 |
The construction of efficient parallel programs usually requires expert knowledge in the application area and a deep insight into the architecture of a specific parallel machine. Often, the resulting performance is not portable, i.e., a program that is efficient on one machine is not necessarily efficient on another machine with a different architecture. Transformation systems provide a more flexible solution. They start with a specification of the application problem and allow the generation of efficient programs for different parallel machines. The programmer has to give an exact specification of the algorithm expressing the inherent degree of parallelism and is released from the low-level details of the architecture. In this article, we propose such a transformation system with an emphasis on the exploitation of the data parallelism combined with a hierarchically organized structure of task parallelism. Starting with a specification of the maximum degree of task and data parallelism, the transformations generate a specification of a parallel program for a specific parallel machine. The transformations are based on a cost model and are applied in a predefined order, fixing the most important design decisions like the scheduling of independent multitask activations, data distributions, pipelining of tasks, and assignment of processors to task activations. We demonstrate the usefulness of the approach with examples from scientific computing.