Fortran 90 handbook: complete ANSI/ISO reference
Fortran 90 handbook: complete ANSI/ISO reference
Compiling Fortran 90D/HPF for distributed memory MIMD computers
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Scheduling Block-Cyclic Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets
IEEE Transactions on Parallel and Distributed Systems
Contention-free communication scheduling for array redistribution
Parallel Computing
Efficient communication sets generation for block-cyclic distribution on distributed-memory machines
Journal of Systems Architecture: the EUROMICRO Journal
Improving communication scheduling for array redistribution
Journal of Parallel and Distributed Computing
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
An MPI prototype for compiled communication on Ethernet switched clusters
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Communication Generation for Irregular Parallel Applications
PARELEC '06 Proceedings of the international symposium on Parallel Computing in Electrical Engineering
VFC: The Vienna Fortran Compiler
Scientific Programming
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters
IEEE Transactions on Parallel and Distributed Systems
A study of process arrival patterns for MPI collective operations
Proceedings of the 21st annual international conference on Supercomputing
Hi-index | 0.00 |
Group communication significantly influences the performance of data parallel applications. It is required often in two situations: one is array redistribution from phase to phase; the other is array remapping after loop partition. Nevertheless, the important factor that influences the efficiency of group communication is often neglected: a larger communication idle time may occur when there is node contention and difference among message lengths during one particular communication step. This paper is devoted to develop an efficient scheduling strategy using the compiling information provided by array subscripts, array distribution pattern and array access period. Our strategy not only avoids inter-processor contention, but it also minimizes real communication cost in each communication step. Our experimental results show that our strategy has better performance than the traditional implement of MPI_Alltoallv, alltoall based scheduling, and greedy scheduling.