Practical parallel algorithms for personalized communication and integer sorting
Practical parallel algorithms for personalized communication and integer sorting
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
An Analytical Method for Predicting the Performance of Parallel Image Processing Operations
The Journal of Supercomputing
Efficient parallel processing on low-cost clusters with GAMMA active ports
Parallel Computing - Parallel computing on clusters of workstations
Integrated Performance Models for SPMD Applications and MIMD Architectures
IEEE Transactions on Parallel and Distributed Systems
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Evaluation of Fast Ethernet, Giganet, and Myrinet on a Cluster
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Performance Prediction Methodology for Parallel Programs with MPI in NOW Environments
IWDC '02 Proceedings of the 4th International Workshop on Distributed Computing, Mobile and Wireless Computing
Exploiting fast ethernet performance in multiplatform cluster environment
Proceedings of the 2004 ACM symposium on Applied computing
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Execution time prediction for parallel data processing tasks
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Hi-index | 0.00 |
The overall performance characteristics of cluster systems depend heavily on the pattern and on the amount of communication between the nodes. The performance may be improved by using asynchronous (nonblocking) message passing, because it allows communication and computation to overlap, thereby hiding a part of the communication overhead. This paper develops an analytical model to capture the performance-related issues of asynchronous communication in a small, fully switched cluster environment. The parameters of the model can be identified from measurable program and hardware characteristics, allowing the model to anticipate the performance behaviour of complex parallel applications. The paper's main contribution is to describe the effect of parallel communication channels on the effective bandwidth of a single node. The model is validated by comparing the predicted and measured performance of two different broadcast primitives for a range of message sizes as a function of the number of the participating nodes.