Performance modeling of distributed memory architectures
Journal of Parallel and Distributed Computing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Early prediction of MPP performance: the SP2, T3D, and Paragon experiences
Parallel Computing
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Modeling the Communication Behavior of the Intel Paragon
MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Modelling the Runtime of Scientific Programs on Parallel Computers
ICPP '00 Proceedings of the 2000 International Workshop on Parallel Processing
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation
Programming support and scheduling for communicating parallel tasks
Journal of Parallel and Distributed Computing
Combined scheduling and mapping for scalable computing with parallel tasks
Scientific Programming - Biological Knowledge Discovery and Data Mining
Hi-index | 0.00 |
Many applications from scientific computing and physical simulations can benefit from a mixed task and data parallel implementation on parallel machines with a distributed memory organization, but it may also be the case that a pure data parallel implementation leads to faster execution times. Since the effort for writing a mixed task and data parallel implementation is large, it would be useful to have an a priori estimation of the possible benefits of such an implementation on a given parallel machine. In this article, we propose an estimation method for the execution time that is based on the modelling of computation and communication times by runtime formulas. The effect of concurrent message transmissions is captured by a contention factor for the specific target machine. To demonstrate the usefulness of the approach, we consider a complex method for the solution of ordinary differential equations with a potential for a mixed task and data parallel execution. As distributed memory machine we consider the Cray T3E and a Linux cluster.