Incorporating memory layout in the modeling of message passing programs

Authors:
F. J. Seinstra;D. Koelma
Affiliations:
Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands;Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands
Venue:
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Year:
2002

Citing 7
Cited 1

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Designing broadcasting algorithms in the Postal Model for message-passing systems

Proceedings of the 4th ACM symposium on Parallel algorithms and architectures
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Data Locality Exploitation in the Decomposition of Regular Domain Problems

IEEE Transactions on Parallel and Distributed Systems
LoGPC: Modeling Network Contention in Message-Passing Programs

IEEE Transactions on Parallel and Distributed Systems
The distributed ASCI Supercomputer project

ACM SIGOPS Operating Systems Review
A Software Architecture for User Transparent Parallel Image Processing on MIMD Computers

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing

A software architecture for user transparent parallel image processing

Parallel Computing - Parallel computing in image and video processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most fundamental tasks of an automatic parallelization tool is to find an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations) this task may seem trivial. However, communication costs in message passing programs often significantly depend on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be non-optimal. In this paper we introduce a new point-to-point communication model (called P-3PC) that is specifically designed to overcome this problem. In comparison with related models (e.g., LogGP) P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message passing definitions as well. The effectiveness of the model is tested in a framework for automatic parallelization of imaging applications. Experiments are performed on two Beowulf-type systems, each having a different interconnection network, and a different MPI implementation. Results show that, where other models frequently fail, P-3PC correctly predicts the communication costs related to any type of domain decomposition.