Message strip-mining heuristics for high speed networks

Authors:
Costin Iancu;Parry Husbands;Wei Chen
Affiliations:
Computational Research Division, Lawrence Berkeley National Laboratory;Computational Research Division, Lawrence Berkeley National Laboratory;Computer Science Division, University of California at Berkeley
Venue:
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Year:
2004

Citing 11
Cited 4

Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
LogGP: incorporating long messages into the LogP model for parallel computation

Journal of Parallel and Distributed Computing
Data Locality Exploitation in the Decomposition of Regular Domain Problems

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Estimation of Communication Costs on Multicomputers

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
A Compilation Approach for Fortran 90D/ HPF Compilers

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A New Approach to Array Redistribution: Strip Mining Redistribution

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
The Design for a High-Performance MPI Implementation on the Myrinet Network

Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
An Evaluation of Current High-Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
GASNet Specification, v1.1

GASNet Specification, v1.1

HUNTing the Overlap

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Productivity and performance using partitioned global address space languages

Proceedings of the 2007 international workshop on Parallel symbolic computation
Optimizing the use of static buffers for DMA on a CELL chip

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Optimizing bandwidth limited problems using one-sided communication and overlap

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we investigate how the compiler technique of message strip-mining performs in practice on contemporary high performance networks. Message strip-mining attempts to reduce the overall cost of communication in parallel programs by breaking up large message transfers into smaller ones that can be overlapped with computation. In practice, however, network resource constraints may negate the expected performance gains. By deriving a performance model and synthetic benchmarks we determine how network and application characteristics in.uence the applicability of this optimization. We use these .ndings to determine heuristics to follow when performing this optimization on parallel programs. We propose strip-mining with variable block size as an alternative strategy that performs almost as well as a highly tuned .xed block strategy and has the advantage of being performance portable across systems and application input sets. We evaluate both techniques using synthetic benchmarks and an application from the NAS Parallel Benchmark suite.