A bridging model for parallel computation
Communications of the ACM
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
An overview of message passing environments
Parallel Computing - Special issue: message passing interfaces
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Designing broadcasting algorithms in the Postal Model for message-passing systems
Proceedings of the 4th ACM symposium on Parallel algorithms and architectures
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model
IEEE Transactions on Parallel and Distributed Systems
C3: a parallel model for coarse-grained machines
Journal of Parallel and Distributed Computing
Scalability, portability and predictability: the BSP approach to parallel programming
Future Generation Computer Systems - Special issue: parallel computing applications
Semi-empirical multiprocessor performance predictions
Journal of Parallel and Distributed Computing
An Analytical Method for Predicting the Performance of Parallel Image Processing Operations
The Journal of Supercomputing
Data Locality Exploitation in the Decomposition of Regular Domain Problems
IEEE Transactions on Parallel and Distributed Systems
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
The distributed ASCI Supercomputer project
ACM SIGOPS Operating Systems Review
A Minimum Cost Approach for Segmenting Networks of Lines
International Journal of Computer Vision
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
A Performance Analysis of the SGI Origin2000
VECPAR '98 Selected Papers and Invited Talks from the Third International Conference on Vector and Parallel Processing
Incorporating memory layout in the modeling of message passing programs
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
IEEE Transactions on Parallel and Distributed Systems
User Transparent Parallel Processing of the 2004 NIST TRECVID Data Set
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Commodity cluster-based parallel processing of hyperspectral imagery
Journal of Parallel and Distributed Computing
Parallel morphological processing of hyperspectral image data on heterogeneous networks of computers
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
User transparent data and task parallel multimedia computing with Pyxis-DT
Future Generation Computer Systems
Hi-index | 0.00 |
One of the most fundamental problems automatic parallelization tools are confronted with is to find an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations), this task may seem trivial. However, communication costs in message passing programs often significantly depend on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be nonoptimal. In this paper, we introduce a new point-to-point communication model (called P-3PC, or the 驴Parameterized model based on the Three Paths of Communication驴) that is specifically designed to overcome this problem. In comparison with related models (e.g., LogGP) P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message passing definitions as well. The effectiveness of the model is tested in a framework for automatic parallelization of low level image processing applications. Experiments are performed on two Beowulf-type systems, each having a different interconnection network, and a different MPI implementation. Results show that, where other models frequently fail, P-3PC correctly predicts the communication costs related to any type of domain decomposition.