Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks
IEEE Transactions on Parallel and Distributed Systems
Processor Mapping Techniques Toward Efficient Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient multicast in all-port wormhole-routed hypercubes
Journal of Parallel and Distributed Computing
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Dynamic data partitioning for distributed-memory multicomputers
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Optimizations for efficient array redistribution on distributed memory multicomputers
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets
IEEE Transactions on Parallel and Distributed Systems
CP-PACS: a massively parallel processor at the University of Tsukuba
Parallel Computing - Special Anniversary issue
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Contention-free communication scheduling for array redistribution
Parallel Computing
A Generalized Processor Mapping Technique for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Multi-dimensional Block-Cyclic Redistribution of Arrays
ICPP '97 Proceedings of the international Conference on Parallel Processing
Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Efficient Block Cyclic Data Redistribution
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Efficient Algorithms for Block-Cyclic Redistribution of Arrays
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Scheduling Messages For Data Redistribution: An Experimental Study
International Journal of High Performance Computing Applications
A flexible processor mapping technique toward data localization for block-cyclic data redistribution
The Journal of Supercomputing
Message scheduling for array re-decomposition on distributed memory systems
Future Generation Computer Systems
Contention-free communication scheduling for group communication in data parallelism
OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Efficient multidimensional data redistribution for resizable parallel computations
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
Many scientific applications require array redistribution when the programs run on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of the programs will degrade considerably. The redistribution overheads consist of two parts: index computation and inter-processor communication. If there is no communication scheduling in a redistribution routine, the inter-processor communication will incur a larger communication idle time when there exists node contention and/or difference among message lengths during one particular communication step. In order to solve this problem, in this paper, we propose an efficient scheduling scheme that not only minimizes the number of communication steps and eliminates node contention, but also minimizes the difference of message lengths in each communication step. Thus, the communication idle time is reduced in redistribution routines.